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(54) Tide: MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS HAVING INCREASED CELLULAR FLUORESCENCE 
(57) Abstract 

Tht present invention is directed to mutants of the jellyfish Aequorea victoria green fluorescent protein (GFP) having at least 5 and 
preferably greater than 20 times ^specific green fluorescence of the wild type protein. In other embodiments, the invention comprises 
mutant blue fluorescent proteins (BFPs) that emit an enhanced blue fluorescence. The invention also encompasses the expression of 

r^JT^fn^ ,< ° FP , 0r BFP m J WidC ™ ety 0f cn e inccrcd host ™* Elation of engineered proteins having 

increased fluorescent activity. The novel mutants of the present invention allow for a significantly more sensitive detection of fluorescence 

ZST^JT I f ^ ssib,e 7 ith G ™ OT with its mutants. Thus, the mutant fluorescent proteins provided herein can be 

used as sensitive reporter molecules to detect the cell and tissue-specific expression and subcellular compakmentalization of GFP or BFP 
mutants, or of chimeric proteins comprising GFP or BFP mutants fused to a regulatory sequence or to a second protein sequence. 
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MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS 
HAVING INCREASED CELLULAR FLUORESCENCE 

FIELD OP THE INVENTION 

This invention generally relates to novel proteins 
and their production which are useful for detecting gene 
expression and for visualizing the subcellular targeting and 
distribution of selected proteins and peptides, among other 
things. The invention specifically relates to mutations in 
the gene coding for the jellyfish Aeqruorea victoria green 
fluorescent protein ( "GFP" ) , which mutations encode mutant GFP 
proteins having either an enhanced green or a blue 
fluorescence, and uses for them. 

BACKGROUND OF THE INVENTION 

Green fluorescent protein ("GFP") is a monomeric 
protein of about 27 kDa which can be isolated from the 
bioluminescent jellyfish Aequorea victoria. When wild type 
GFP is illuminated by blue or ultraviolet light, it emits a 
brilliant green fluorescence. Similar to fluorescein 
isothiocyanate, GFP absorbs ultraviolet and blue light with a 
maximum absorbance at 3 95 nm and a minor peak of absorbance at 
470 nm, and emits green light with a maximum emission at 509 
nm with a minor peak at 540 nm. GFP fluorescence persists 
even after fixation with formaldehyde, and it is more stable 
to photobleaching than fluorescein. 

The gene for GFP has been isolated and sequenced. 
Prasher, D. C. et al . (1992), "Primary structure of the 
Aequorea victoria green fluorescent protein," Gene 111:229- 
233. Expression vectors that comprise the GFP gene or cDNA 
have been introduced into a variety of host cells. These host 
cells include: Chinese hamster ovary (CHO) cells, human 
embryonic kidney cells (HEK293), COS-1 monkey cells, myeloma 
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cells, NIH 3T3 mouse fibroblasts, .PtKl cells, BHK cells, PC12 
cells, Xenopus, leech, transgenic zebra fish, transgenic mice, 
Drosophila and several plants. The GFP molecules expressed by 
these different cells have a similar fluorescence as the 
5 native molecules, demonstrating that the GFP fluorescence does 
not require any species-specific cof actors or substrates. 
See, e.g., Baulcombe, D. et al . (1995), "Jellyfish green 
fluorescent protein as a reporter for virus infections," The 
Plant Journal 7:1045-1053; Chalfie, M. et al . (1994), "Green 

10 fluorescent protein as a marker for gene expression, " Science 
263:802-805; Inouye, S. & Tsu j i , F. (1994), "Aequorea green 
fluorescent protein: expression of the gene and fluorescent 
characteristics of the recombinant protein, " FEBS Letters 
341:277-280; Inouye, S. & Tsuji, F. (1994), "Evidence for 

15 redox forms of the Aequorea green fluorescent protein," FEBS 
Letters 351:211-214; Kain, S. et al . (1995), "The green 
fluorescent protein as a reporter of gene expression and 
protein localization," BioTechniques (in press); Kitts, P. et 
al. (1995), "Green Fluorescent Protein (GFP): A novel reporter 

20 for monitoring gene expression in living organisms," 

CLONTECHnlques X(l):l-3; Lo, D. et al. (1994), "Neuronal 
transfection in brain slices using particle-mediated gene 
transfer," Neuron 13:1263-1268; Moss, J. B . & Rosenthal, N. 
(1994), "Analysis of gene expression patterns in the embryonic 

25 mouse myotome with the green fluorescent protein, a new vital 
marker," J\ Cell. Biochem. , Supplement 18DW161; Niedz, R. et 
al. (1995), "Green fluorescent protein: an in vivo reporter of 
plant gene expression," Plant Cell Reports 14:403-406; Wu, 
G.-I. et al. (1995), "Infection of frog neurons with vaccinia 

30 virus permits in vivo expression of foreign proteins," Neuron 
14:681-684; Yu, J. & van den Engh, G. (1995), "Flow-sort and 
growth of single bacterial cells transformed with cosmid and 
plasmid vectors that include the gene for green- fluorescent 
protein as a visible marker, " Abstracts of papers presented at 

35 the 1995 meeting on "Genome Mapping and Sequencing, " Cold 
Spring Harbor, p. 293. 

The active GFP chromophore is a hexapeptide which 
contains a cyclized Ser-dehydroTyr-gly trimer at positions 65- 
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67. This chromophore is only fluorescent when embedded 
within the intact GFP protein. Chromophore formation occurs 
post-translationally; nascent GFP is not fluorescent. The 
chromophore is thought to be formed by a cyclization reaction 
and an oxidation step that requires molecular oxygen. 

Proteins can be fused to the amino (N-) or carboxy 
(C-) terminus of GFP. Such fused proteins have been shown to 
retain the fluorescent properties of GFP and the functional 
properties of the fusion partner. Bian, J. et al. (1995), 
"Nuclear localization of HIV-l matrix protein P17: The use of 
A. victoria GFP in protein tagging and tracing," FASEB J. 
9:AI279; Flach, J. et al. (1994), "A yeast RNA-binding 
protein shuttles between the nucleus and the cytoplasm, " Mol . 
Cell. Biol. 14:8399-8407; Marshall, J. et al. (1995), "The 
jellyfish green fluorescent protein: a new tool for studying 
ion channel expression and function," Neuron 14:211-215; 
Olmsted, J. etal. (1994), "Green Fluorescent Protein (GFP) 
chimeras as reporters for MAP4 behavior in living cells," Mol. 
Biol, of the Cell 5:167a ; Rizzuto, R. et al. (1995), "Chimeric 
green fluorescent protein as a tool for visualizing 
subcellular organelles in living cells'" Current Biol. 
5:635-642; Sengupta, P. et al . (1994), "The C. elegans gene 
odr-7 encodes an olfactory-specific member of the nuclear 
receptor superfamily, " Cell 79:971-980; Stearns, T. (1995), 
"The green revolution," Current Biol. 5:262-264; Treinin, M. & 
Chalfie, M. (1995), "A mutated acetylcholine receptor subunit 
causes neuronal degeneration in C. elegans," Neuron 14:871- 
877; Wang, S. Sc Hazelrigg, T. (1994), "Implications for bed 
MRNA localization from spatial distribution of exu protein in 
Drosophila oogenesis," JVature 369:400-403. 

A number of GFP mutants have been reported. 
Delagrave, S. et al. (1995) "Red-shifted excitation mutants of 
the green fluorescent protein," Bio/Technology 13:151-154; 
Heim, R . et al. (1994) "Wavelength mutations and 
posttranslational autoxidation of green fluorescent protein, " 
Proc. Natl. Acad. Sci. USA 91:12501-12504; Heim, R. et al . 
(1995), "Improved green fluorescence," Nature 373:663-664. 
Delgrave et al. (1995) Bio/Technology 13:151-154 isolated 
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mutants of cloned Aequorea victoria GFP that had red-shifted 
excitation spectra. Heim, R . et al . (1994) "Wavelength 
mutations and posttranslational autoxidation of green 
fluorescent protein," Proc. Natl. Acad. Sci . USA 91:12501- 
5 12504 reported a mutant (Tyr66 to His) having a blue 

fluorescence, which is herein designated BFP (Tyr 67 -*His) . 
These references have neither taught nor suggested that their 
mutations resulted in an increase in the cellular fluorescence 
of the mutant GFPs. 

10 In general, the level of fluorescence of a protein 

expressed in a cell depends on several factors, such as number 
of copies made of the fluorescent protein, stability of the 
protein, efficiency of formation of the chromophore, and 
interactions with cellular solvents, solutes and structures. 

15 Although the fluorescent signal from wild type GFP or from the 
reported mutants is generally adequate for bulk detection of 
abundantly expressed GFP or of GFP-containing chimeras, it is 
inadequate for detecting transient low or constitut ively low 
levels of expression, or for performing fine structural 

20 subcellular localizations. This limitation severely restricts 
the use of native GFP or of the reported mutants as a 
biochemical and structural marker for gene expression and 
morphological studies. 

25 SUMMARY OF THE INVENTION 

It an object of the invention to provide engineered 
GFP-encoding nucleic acid sequences that encode modified GFP 
molecules having a greater cellular fluorescence than wild 
30 type GFP or prior described recombinant GFP. 

It is a further object of this invention to provide 
recombinant vectors containing these modified GFP-encoding 
nucleic acid sequences, which vectors are capable of being 
inserted into a variety of cells (including mammalian and 
35 eukaryotic cells) and expressing the modified GFP. 

It is also an object of this invention to provide 
host cells capable of providing useful quantities of 
homogeneous modified GFP. 
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It is yet another object of this invention to 
provide peptides that possess a greater cellular fluorescence 
than native GFP or unaltered recombinant GFP and that can be 
produced in large quantities in a laboratory, by a 
microorganism or by a cell in culture. 

These and other objects of the invention have been 
accomplished by providing mutant GFP-encoding nucleic acids 
whose gene product exhibits an increased cellular fluorescence 
relative to naturally occurring or recombinantly produced wild 
type GFP ( "wtGFP" ) . in some embodiments, the modified GFPs 
possess fluorescent activity that is 50-100 fold greater than 
that of unmodified GFP. 

The modified proteins of the present invention are 
produced by making mutations in a genetic sequence that 
result in alterations in the amino acid sequence of the 
resulting gene product. Our starting material was a GFP- 
encoding nucleic acid wherein a codon encoding an additional 
nucleic acid was inserted at position 2 of the previously 
published GFP amino acid sequence (Chalfie et al., 1994), to 
introduce a useful restriction site. Due to the amino acid 
insertion at position 2 of the GFP amino acid sequence, our 
numbering of the GFP amino acids and description of the amino 
acid amutations is off by one as compared to the originally 
reported wild type GFP sequence (Prasher et al., 1992). Thus, 
amino acid 6 5 by our numbering corresponds to amino acid 64 of 
the originally reported wild type GFP, amino acid 168 
corresponds to amino acid 167 of the originally reported wild 
type GFP, etc. 

Using the modified wild type GFP described herein, a 
number of the unique mutants described herein derive from the 
discovery of an unplanned and unexpected mutation called 
"SG12", obtained in the course of site-directed mutagenesis 
experiments, wherein a phenylalanine at position 65 of wtGFP 
was converted to leucine. A mutant referred to as "SG11," 
which combined the phenylalanine 65 to leucine alteration with 
an isoleucine 168 to threonine substitution and a lysine 239 
to asparagine substitution, gave a further enhanced 
fluorescence intensity. The lysine 239 to asparagine 
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substitution does not affect the fluorescence of GFP; indeed 
the C- terminal lysine or asparagine may be deleted without 
affecting fluorescence. A third and further improved GFP 
mutant was obtained by further mutating "SGll." This mutant 
5 is referred to as "SG25" and , in addition to the SGll 

mutations, contains an additional mutation, a substitution of 
a cysteine at position 66 for the serine normally found at 
that position in the sequence. 

In addition, the invention encompasses novel GFP 

10 mutants that emit a blue fluorescence. These blue mutants are 
derived from a mutation of the wild type GFP (Heim, R. et al. 
(1994) 11 Wavelength mutations and posttranslational 
autoxidation of green fluorescent protein," Proc. Natl. Acad. 
Sci. USA 91:12501-12504), in which histidine was substituted 

15 for tyrosine at amino acid position 66. This mutant emits a 

blue fluorescence, i.e., it becomes a Blue Fluorescent Protein 
(BFP) . 

Novel BFP mutants having an enhanced blue 
fluorescence were made by further modifying this 

20 BFP (Tyr 67 -*His) . The introduction of the same mutation used to 
generate SG12, (i.e., phenylalanine to leucine at position 65) 
into BFP (Tyr 67 -*His) resulted in a new mutant having a brighter 
fluorescence, designated "SuperBlue-42 " (SB42) . A second 
independently generated mutation of BFP (Tyr 67 -*His) , in which a 

25 valine at position 164 was converted to alanine, also emitted 
an enhanced blue fluorescent signal and is referred to as 
"SB49." A combination of the above two mutations resulted in 
"SB50 ,f , which exhibited an even greater fluorescence 
enhancement than either of the previous mutations. 

3 0 The novel GFP and BFP mutants of this invention 

allow for a significantly more sensitive detection of 
fluorescence in host cells than is possible with the wild type 
protein. Accordingly, the mutant GFPs provided herein can be 
used, among other things, as sensitive reporter molecules to 

35 detect the cell and tissue-specific expression and subcellular 
compartmentalization of GFP or of chimeric proteins comprising 
GFP fused to a regulatory sequence or to a second protein 
sequence. In addition, these mutations make possible a 
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variety of one and two color protein assays to quant itate 
expression in mammalian cells. 



DETAILED DESCRIPTION OF THE INVENTION 

The present invention comprises mutant nucleic acids 
that encode engineered GFPs having a greater cellular 
fluorescence than either native GFP or unaltered ("wild type") 
recombinant GFP, and the mutant GFPs themselves- It further 
comprises a subset of mutant GFPs that are mutant blue 
fluorescent proteins ("BFPs") that are derived from a 
published BFP, designated BFP (Tyr 67 -His) , wherein the mutant 
BFPs have a cellular fluorescence that is at least five times 
greater, preferably ten times greater, and most preferably 20 
times greater than that of BFP (Tyr 67 ->His) . The invention also 
encompasses compositions such as vectors and cells that 
comprise either the mutant nucleic acids or the mutant protein 
gene products. The mutant GFP nucleic acids and proteins may 
be used to detect and quantify gene expression in living 
cells, and to detect and quantify tissue specific expression 
and subcellular distribution of GFP or of GFP fused to other 
proteins . 

I. General Definitiona 

Unless defined otherwise, all technical and 
scientific terms used herein have the same meaning as commonly 
understood by one of ordinary skill in the art to which this 
invention belongs., Singleton et al . (1994) Dictionary of 
Microbiology and Molecular Biology, second edition, John Wiley 
and Sons (New York) provides one of skill with a general 
dictionary of many of the terms used in this invention. 
Although any methods and materials similar or equivalent to 
those described herein can be used in the practice or testing 
of the present invention, the preferred methods and materials 
are described. For purposes of the present invention, the 
following terms are defined below. 
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The symbols, abbreviations and definitions used 
herein are set forth below: 

DNA, deoxyribonucleic acid 
RNA, ribonucleic acid 
5 mRNA, messenger RNA 

cDNA, complementary DNA (enzymatically synthesized from an 

mRNA sequence) 
A-Adenine 
T- Thymine 
10 G-Guanine 
C-Cytosine 
U-Uracil 

GFP f Green Fluorescent Protein 
BFP, Blue Fluorescent Protein 

15 

Amino acids are sometimes referred to herein by the 
conventional one or three letter codes. 

Wild type green fluorescent protein ("wtGFP") refers 
to the 23 9 amino acid sequence described by Chalfie et al . , 
20 Science 263 , 802-805, 1994, the nucleotide sequence of which 
is set out as SEQ ID N0:1, and the amino acid sequence of 
which is set out as SEQ ID NO: 2. This sequence differs from 
the original 238 amino acid GFP isolated from the 
bioluminescent jellyfish Aequorea victoria in that one amino 
25 acid has been inserted after position 2 of the 238 amino acid 
sequence. When reference in this application is made to an 
amino acid position of GFP, the position is made with 
reference to that described by Chalfie et al., supra and thus 
Of SEQ ID NO: 2. 

30 The term "blue fluorescent protein" (BFP) refers to 

mutants of wtGFP wherein the tyrosine at position 67 is 
converted to a histidine, which mutants emit a blue 
fluorescence. The non- limiting prototype is herein designated 
BFP(Tyr 67 -*His) . 

35 A shorthand designation for mutations that result in 

a change in amino acid sequence is the one or three letter 
code for the original amino acid, the number of the position 
of the amino acid in the wtGFP sequence, followed by the one 
or three letter code for the new amino acid. Thus, Phe65Leu 

40 or F65L both designate a mutation wherein the phenylalanine at 
position 65 of the wtGFP is converted to leucine. 

Salts of any of the proteins described herein will 
naturally occur when such proteins are present in (or isolated 
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from) aqueous solutions of various pHs . All salts of peptides 
having the indicated biological activity are considered to be 
within the scope of the present invention. Examples include 
alkali, alkaline earth, and other metal salts of carboxylic 
acid residues, acid addition salts (e.g., HC1) of amino 
residues, and Zwitterions formed by reactions between 
carboxylic acid and amino acid residues within the same 
molecule . 

The terms ,f bioluminescent " and "fluorescent" refer 
to the ability of GFP or of a derivative thereof to emit light 
("emitted or fluorescent light") of a characteristic 
wavelength when excited by light which is generally of a 
characteristic and different wavelength than that used to 
generate the emission. 

The term "cellular fluorescence" denotes the 
fluorescence of a GFP-derived protein of the present invention 
when expressed in a cell, especially a mammalian cell. 

The term "nucleic acid" refers to a 
deoxyribonucleotide or ribonucleotide polymer in either 
single- or double -stranded form, and unless specifically 
limited, encompasses known analogues of natural nucleotides 
that hybridize to nucleic acids in a manner similar to 
naturally occurring nucleotides. Unless otherwise indicated, 
a particular nucleic acid sequence implicitly provides the 
complementary sequence thereof, as well as the sequence 
explicitly indicated. As used herein, the terms "nucleic 
acid" and "gene" are interchangeable, and they encompass the 
term " cDNA . " 

The phrase "a nucleic acid sequence encoding" refers 
to a nucleic acid which contains sequence information that, if 
translated, yields the primary amino acid sequence of a 
specific protein or peptide. This phrase specifically 
encompasses degenerate codons (i.e., different codons which 
encode a single amino acid) of the native sequence or 
sequences which may be introduced to conform with codon 
preference in a specific host cell. 

The phrase "nucleic acid construct" denotes a 
nucleic acid that is composed of two or more nucleic acid 



WO 97/42320 



PCT/US97/07625 



10 

sequences that are derived from different sources and that are 
ligated together using methods known in the art. 

The term "regulatory sequence" denotes all the non- 
coding elements of a nucleic acid sequence required for the 
5 correct and efficient expression of the "coding region" (i.e., 
the region that actually encodes the amino acid sequence of a 
peptide or protein), e.g., binding cites for polymerases and 
transcription factors, transcription and translation 
initiation and termination sequences, TATA box, a promoter to 
10 direct transcription, a ribosome binding site for 

translational initiation, polyadenylation sequences, enhancer 
elements . 

The term "isolated" refers to material which is 
substantially or essentially free from components which 

15 normally accompany it as found in its native state (for 

example, a band on a gel) . The isolated nucleic acids and the 
isolated proteins of this invention do not contain materials 
normally associated with their in situ environment, in 
particular, nuclear, cytosolic or membrane associated proteins 

20 or nucleic acids other than those nucleic acids which are 

indicated. The term "homogeneous" refers to a peptide or DNA 
sequence where the primary molecular structure (i.e., the 
sequence of amino acids or nucleotides) of substantially all 
molecules present in the composition under consideration is 

25 identical. The term "substantially" used in the preceding 
sentences preferably means at least 80% by weight, more 
preferably at least 95% by weight, and most preferably at 
least 99% by weight. 

The nucleic acids of this invention, whether RNA, 

30 cDNA, genomic DNA, or a hybrid of the various combinations, 

are synthesized in vitro or are isolated from natural sources 
or recombinant clones. The nucleic acids claimed herein are 
present in transformed or transfected whole cells, in 
transformed or transfected cell lysates, or in a partially 

35 purified or substantially pure form. The nucleic acids of the 
present invention are obtained as homogeneous preparations. 
They may be prepared by standard techniques well known in the 
art, including selective precipitation with such substances as 
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ammonium sulfate, isopropyl alcohol, ethyl alcohol, and/or 
exclusion, ion exchange or affinity column chromatography, 
imraunopurification methods, and others. 

The phrase "conservatively modified variants 
thereof," when used with reference to a protein, denotes 
conservative amino acid substitutions in which both the 
original and the substituted amino acids have similar 
structure (e.g., the R group contains a carboxylic acid) and 
properties (e.g., the original and the substituted amino acids 
are acidic, such as glutamic and aspartic acid) , such that the 
substitutions do not essentially alter specified properties of 
the protein, such as fluorescence. Amino acid substitutions 
that are conservative are well known in the art. The phrase 
"conservatively modified variants thereof," when used to 
describe a reference nucleic acid, denotes nucleic acids 
having nucleotide substitutions that yield degenerate codons 
for a given amino acid or that encode conservative amino acid 
substitutions, as compared to the reference nucleic acid. 

The term "recombinant" or "engineered" when used 
with reference to a nucleic acid or a protein generally 
denotes that the composition or primary sequence of said 
nucleic acid or protein has been altered from the naturally 
occurring sequence using experimental manipulations well known 
to those skilled in the art. It may also denote that a 
nucleic acid or protein has been isolated and cloned into a 
vector, or that the nucleic acid that has been introduced into 
or expressed in a cell or cellular environment other than the 
cell or cellular environment in which said nucleic acid or 
protein may be found in nature. The phrase "engineered 
Aeguorea victoria fluorescent protein" specifically 
encompasses a protein obtained by introducing one or more 
sequence alterations into the coding region of a nucleic acid 
that encodes wild type Aeguorea victoria GFP, wherein the gene 
product of the engineered nucleic acid is a fluorescent 
protein recognized by antisera to wild type Aeguorea victoria 



GFP 



The term "recombinant" or "engineered" when used 
with reference to a cell indicates that, as a result of 



WO 97/42320 



PCT/US97/07625 



12 

experimental manipulation, the cell replicates or expresses a 
nucleic acid or expresses a peptide or protein encoded by a 
nucleic acid, whose origin is exogenous to the cell. 
Recombinant cells can express nucleic acids that are not found 
5 within the native ( non- recombinant ) form of the cell. 

Recombinant cells can also express nucleic acids found in the 
native form of the cell wherein the nucleic acids are re- 
introduced into the cell by artificial means. 

The term "vector" denotes an engineered nucleic acid 

10 construct that contains sequence elements that mediate the 
replication of the vector sequence and/or the expression of 
coding sequences present on the vector. Examples of vectors 
include eukaryotic and prokaryotic plasmids, viruses (for 
example, the HIV virus), cosmids, phagemids, and the like. 

15 The term "operably linked" refers to functional linkage 
between a first nucleic acid (for example, an expression 
control sequence such as a promoter or an array of 
transcription factor binding sites) and a second nucleic acid 
sequence, wherein the expression control sequence directs 

20 transcription of the nucleic acid corresponding to the second 
sequence. One or more selected isolated nucleic acids may be 
operably linked to a vector by methods known in the art. 

"Transduction" or "transformation" denotes the 
process whereby exogenous extracellular DNA is introduced into 

25 a cell, such that the cell is capable of replicating and or 
expressing the exogenous DNA. Generally, a selected nucleic 
acid is first inserted into a vector and the vector is then 
introduced into the cell. For example, plasmid DNA that is 
introduced under appropriate environmental conditions may 

30 undergo replication in the transformed cell, and the 

replicated copies are distributed to progeny cells when cell 
division occurs. As a result, a new cell line is established, 
containing the plasmid and carrying the genetic determinants 
thereof. Transformation by a plasmid in this manner, where 

35 the plasmid genes are maintained in the cell line by plasmid 
replication, occurs at high frequency when the transforming 
plasmid DNA is in closed loop form, and does not or rarely 
occurs if linear plasmid DNA is used. 
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All the patents and publications cited in this 
disclosure are indicative of the level of skill of those 
skilled in the art to which this invention pertains and are 
all herein individually incorporated by reference for all 
purposes . 

11 • The GFP Mutants a nd Their Expression 

A. The GFP mutants 

The isolated nucleic acids reported here are those 
that encode an engineered protein derived from Aeguorea 
victoria green fluorescent protein ("GFP") having a 
fluorescence at maximum emission that is at least five times 
greater, preferably ten times greater, and most preferably 
twenty times greater than the fluorescence at maximum emission 
of wild type GFP. In one embodiment, a nucleic acid encodes 
for leucine at amino acid position 65. This amino acid 
position is important for the enhanced fluorescence. In 
another embodiment the engineered isolated GFP nucleic acid 
also encodes for threonine at amino acid position 168. In an 
additional embodiment, the engineered isolated GFP nucleic 
acid further encodes for cysteine at amino acid position 66. 

Also described here are GFP mutants that have 
enhanced blue fluorescent properties. These mutants have an 
isolated nucleic acid that encode an engineered Aeguorea 
victoria blue fluorescent protein that encodes for histidine 
at amino acid position 67, leucine at amino acid position 65 
and has a cellular fluorescence that is at least five times 
greater, preferably 10 times greater, most preferably 20 times 
greater than that of BFP (Tyr 67 -*His) . An alternative isolated 
BFP nucleic acid is one that encodes for an engineered 
Aequorea victoria blue fluorescent protein wherein the 
engineered BFP has histidine at amino acid position 67 and 
alanine at amino acid position 164 . A third engineered 
isolated BFP nucleic acid sequence is one that has histidine 
at amino acid position 67, leucine at amino acid position 65 
and alanine at amino acid position 164. 
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The nucleic acid and amino acid sequences for the 
wild type GFP are set out in SEQ ID NO:l and SEQ ID NO: 2. The 
sequence is well-known, well -described and readily available 
for manipulation and use. Vectors bearing the nucleic acid 
5 sequence are commercially readily available from, for example, 
Clontech Laboratories, Inc., Clontech Laboratories, Inc., Palo 
Alto, CA. Clontech provides a line of reporter vectors for 
GFP, including the cDNA construct described by Chalfie, et 
al. f supra, a promoterless GFP vector for monitoring the 

10 ^expression of cloned promoters in mammalian cells, and a 

series of vectors for creating fusion proteins to either the 
amino or carboxy terminus of GFP. 

One of skill in the art will recognize many ways of 
generating alterations in a given nucleic acid sequence. Such 

15 well-known methods include site-directed mutagenesis, PCR 

amplification using degenerate oligonucleotides, exposure of 
cells containing the nucleic acid to mutagenic agents or 
radiation, chemical synthesis of a desired oligonucleotide 
(e.g., in conjunction with ligation and/or cloning to generate 

20 large nucleic acids) and other well-known techniques. See, 
e.g., Berger and Kimmel, Guide to Molecular Cloning 
Techniques, Methods in Enzymology Volume 152 Academic Press, 
Inc., San Diego, CA (Berger); Sambrook et al . (1989) Molecular 
Cloning - A Laboratory Manual (2nd ed. ) Vol. 1-3, Cold Spring 

25 Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook) ; 
and Current Protocols in Molecular Biology, F.M. Ausubel et 
al., eds . , Current Protocols, a joint venture between Greene 
Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 
Supplement) (Ausubel); Pirrung et al . , U.S. Patent No. 

30 5/143,854; and Fodor et al., Science, 251, 767-77 (1991). 

Product information from manufacturers of biological reagents 
and experimental equipment also provide information useful in 
known biological methods. Such manufacturers include the 
SIGMA Chemical Company (Saint Louis, MO) , R&D systems 

3 5 (Minneapolis, MN) , Pharmacia LKB Biotechnology (Piscataway, 
NJ) , CLONTECH Laboratories, Inc. (Palo Alto, CA) , Chem Genes 
Corp., Aldrich Chemical Company (Milwaukee, WI), Glen 
Research, Inc., GIBCO BRL Life Technologies, Inc. 
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(Gaithersberg, MD) , Fluka Chemica-Biochemika Analytika (Fluka 
Chemie AG, Buchs, Switzerland), and Applied Biosystems (Foster 
City, CA) , as well as many other commercial sources known to 
one of skill. Using these techniques, it is possible to 
substitute at will any nucleotide in a nucleic acid that 
encodes any GFP or BFP disclosed herein or any amino acid in a 
GFP or BFP described herein for a predetermined nucleotide or 
amino acid. For example, it is possible to generate at will 
modified GFPs and BFP (Tyr 67 -»His) s that contain leucine at 
position 65 and one or two or three additional mutations at 
any other position of the wtGFP or BFP (Tyr 67 -*His) . 

The sequence of the cloned genes and synthetic 
oligonucleotides can be verified using the chemical 
degradation method of A.M. Maxam et al . (1980), Methods in 
Enzymology 65:499-560. The sequence can be confirmed after 
the assembly of the oligonucleotide fragments into the 
double- stranded DNA sequence using the method of Maxam and 
Gilbert, supra, or the chain termination method for sequencing 
double-stranded templates of R.B. Wallace et al. (1981), Gene, 
16:21-26. DNA sequencing may also be performed by the 
PCR-assisted fluorescent terminator method (ReadyReaction 
DyeDeoxy Terminator Cycle Sequencing Kit, ABI , Columbia, MD) 
according to the manufacturer's instructions, using the ABI 
Model 3 73A DNA Sequencing System. Sequencing data is analyzed 
using the commercially available Sequencher program (Gene 
Codes, Gene Codes, Ann Arbor, MI) . 

B « Express ion of Mutant GPP 

Clearly, the nucleic acid sequences of the present 
invention are excellent reporter sequences since the expressed 
proteins can be readily detected by fluorescence as described 
below. The sequences can be used in conjunction with any 
application appreciated to date for GFP and further in 
applications where a greater degree of fluorescence is 
required. Expression of the sequences described herein 
whether expression is desired alone or in combination with 
other sequences of interest is described below. 
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Vectors to which selected foreign nucleic acids are 
operably linked may be used to introduce these selected 
nucleic acids into host cells and mediate their replication 
and/or expression. Cloning vectors are useful for replicating 
5 the foreign nucleic acids and obtaining clones of specific 

foreign nucleic acid-containing vectors. Expression vectors 
mediate the expression of the foreign nucleic acid. Some 
vectors are both cloning and expression vectors. 

Once a nucleic acid is synthesized or isolated and 

10 inserted into a vector and cloned, one may express the nucleic 
acid in a variety of recombinantly engineered cells known to 
those of skill in the art. As used herein, "expression" 
refers to transcription of nucleic acids, either without or 
preferably with subsequent translation. 

15 Expression of a mutant BFP or of wild type or mutant 

GFP can be enhanced by including multiple copies of the GFP- 
encoding nucleic acid in a transformed host, by selecting a 
vector known to reproduce in the host, thereby producing large 
quantities of protein from exogenous inserted DNA (such as 

20 pUC8, ptacl2, or pIN- III -ompAl , 2, or 3), or by any other 

known means of enhancing peptide expression. In all cases, 
wtGFP or mutant GFPs will be expressed when the DNA sequence 
is functionally inserted into a vector. "Functionally 
inserted" means that it is inserted in proper reading frame 

25 and orientation. Typically, a GFP gene will be inserted 
downstream from a promoter and will be followed by a stop 
codon, although production as a hybrid protein followed by 
cleavage may be used, if desired. 

Examples of cells which are suitable for the cloning 

30 and expression of the nucleic acids of the invention include 
bacteria, yeast, filamentous fungi, insect (especially 
employing baculoviral vectors), and mammalian cells, in 
particular cells capable of being maintained in tissue 
culture . 

35 Host cells are competent or rendered competent for 

transformation by various means. There are several well-known 
methods of introducing DNA into animal cells. These include: 
calcium phosphate precipitation, fusion of the recipient cells 
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with bacterial protoplasts containing the DNA, treatment of 
the recipient cells with liposomes containing the DNA, DEAE 
dextran, receptor-mediated endocytosis, electroporation and 
micro-injection of the DNA directly into the cells. 

It is expected that those of skill in the art are 
knowledgeable in the numerous systems available for cloning 
and expression of nucleic acids. In brief summary, the 
expression of natural or synthetic nucleic acids is typically 
achieved by operably linking a nucleic acid of interest to a 
promoter (which is either constitutive or inducible) , and 
incorporating the construct into an expression vector. The 
vectors are suitable for replication and integration in 
prokaryotes, eukaryotes, or both. Typical cloning vectors 
contain transcription and translation terminators, 
transcription and translation initiation sequences, and 
promoters useful for regulation of the expression of the 
particular nucleic acid. The vectors optionally comprise 
generic expression cassettes containing at least one 
independent terminator sequence, sequences permitting 
replication of the cassette in eukaryotes, or prokaryotes, or 
both, (e.g., shuttle vectors) and selection markers for both 
prokaryotic and eukaryotic systems. See, e.g., Sambrook and 
Ausbel (both supra) . 

1. Expression in Prokaryotes 

Prokaryotic systems for cloning and/or expressing 
engineered GFP or BFP proteins are available using E. coli t 
Bacillus sp. and Salmonella (Palva, I. et al. (1983), Gene 
22:229-235; Mosbach, K. et al . (1983), Nature 302:543-545. To 
obtain high level expression in a prokaryotic system of a 
cloned nucleic acid such as those encoding engineered GFPs or 
BFPs, it is essential to construct expression vectors which 
contain, at a minimum, a strong promoter to direct 
transcription, a ribosome binding site for translational 
initiation, a transcription/translation terminator, a 
bacterial replicon, a nucleic acid encoding antibiotic 
resistance to permit selection of bacteria that harbor 
recombinant plasmids, and unique restriction sites in 
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nonessential regions of the plasmid to allow insertion of 
foreign nucleic acids. The particular antibiotic resistance 
gene chosen is not critical, any of the many resistance genes 
known in the art are suitable. Examples of regulatory regions 
5 suitable for this purpose in £. coli are the promoter and 

operator region of the E. coli tryptophan biosynthetic pathway 
as described by Yanofsky, C. (1984), J. Bacterial., 
158:1018-1024, and the leftward promoter of phage lambda (P L ) 
as described by Herskowitz, I. and Hagen, D. (1980), Ann. Rev. 

10 Genet., 14:399-445 (1980). 

The particular vector used to transport the genetic 
information into the cell is not particularly critical. Any 
of the conventional vectors used for replication, cloning 
and/cr expression in prokaryotic cells may be used. 

15 The foreign nucleic acid can be incorporated into a 

nonessential region of the host cell's chromosome. This is 
achieved by first inserting the nucleic acid into a vector 
such that it is flanked by regions of DNA homologous to the 
insertion site in the host chromosome. After introduction of 

20 the vector into a host cell, the foreign nucleic acid is 

incorporated into the chromosome by homologous recombination 
between the flanking sequences and chromosomal DNA. 

Detection of the expressed protein is achieved by 
methods known in the art as radioimmunoassays, or Western 

25 blotting techniques or immunoprecipitation . Purification from 
E. coli can be achieved following procedures described in U.S. 
Patent No. 4,511,503. 



2. Expression in Eukaryotes 

30 Standard eukaryotic transfection methods are used to 

produce mammalian, yeast or insect cell lines which express 
large quantities of engineered GFP or BFP protein which are 
then purified using standard techniques. See, e.g., Colley et 
al. (1989), J". Biol. Chem. 264:17619-17622, and Guide to 

35 Protein Purification, in Vol. 182 of Methods in Enzymology 
(Deutscher ed. , 1990), D.A. Morrison (1977), jr. Bact., 
132:349-351, or by J.E. Clark-Curtiss and R. Curtiss (1983), 
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Methods in Enzymology 101:347-362, Eds. R . Wu et al . , 
Academic Press, New York. 

The particular eukaryotic expression vector used to 
transport the genetic information into the cell is not 
particularly critical. Any of the conventional vectors used 
for expression in eukaryotic cells may be used. Expression 
vectors containing regulatory elements from eukaryotic viruses 
such as retroviruses are typically used. SV40 vectors include 
pSVT7 and pMT2 . Vectors derived from bovine papilloma virus 
include pBV-lMTHA, and vectors derived from Epstein Barr virus 
include pHEBO, and p205. Other exemplary vectors include 
pMSG, pAV009/A + , pMTO10/A + , pMAMneo-5, baculovirus pDSVE, and 
any other vector allowing expression of proteins under the 
direction of the SV-4 0 early promoter, SV-4 0 later promoter, 
metallothionein promoter, murine mammary tumor virus promoter, 
Rous sarcoma virus promoter, polyhedrin promoter, or other 
promoters shown effective for expression in eukaryotic cells. 

The expression vector typically comprises a 
eukaryotic transcription unit or expression cassette that 
contains all the elements required for the expression of the 
engineered GFP or BFP DNA in eukaryotic cells. A typical 
expression cassette contains a promoter operably linked to the 
DNA sequence encoding a engineered GFP or BFP protein and 
signals required for efficient polyadenylation of the 
transcript . 

Eukaryotic promoters typically contain two types of 
recognition sequences, the TATA box and upstream promoter 
elements. The TATA box, located 25-30 base pairs upstream of 
the transcription initiation site, is thought to be involved 
in directing RNA polymerase to begin RNA synthesis. The other 
upstream promoter elements determine the rate at which 
transcription is initiated. 

Enhancer elements can stimulate transcription up to 
1,000 fold from linked homologous or heterologous promoters. 
Enhancers are active when placed downstream or upstream from 
the transcription initiation site. Many enhancer elements 
derived from viruses have a broad host range and are active in 
a variety of tissues. For example, the SV40 early gene 
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enhancer is suitable for many cell types. Other 
enhancer/promoter combinations that are suitable for the 
present invention include those derived from polyoma virus, 
human or murine cytomegalovirus, the long term repeat from 
5 various retroviruses such as murine leukemia virus, murine or 
Rous sarcoma virus and HIV. See, Enhancers and Eukaryotic 
Expression, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. 
1983, which is incorporated herein by reference. 

In the construction of the expression cassette, the 

10 promoter is preferably positioned about the same distance from 
the heterologous transcription start site as it is from the 
transcription start site in its natural setting. As is known 
in the art, however, some variation in this distance can be 
accommodated without loss of promoter function. 

15 In addition to a promoter sequence, the expression 

cassette should also contain a transcription termination 
region downstream of the structural gene to provide for 
efficient termination. The termination region may be obtained 
from the same gene as the promoter sequence or may be obtained 

20 from different genes. 

If the mRNA encoded by the structural gene is to be 
efficiently translated, polyadenylation sequences are also 
commonly added to the vector construct. Two distinct sequence 
elements are required for accurate and efficient 

25 polyadenylation: GU or U rich sequences located downstream 

from the polyadenylation site and a highly conserved sequence 
of six nucleotides, AAUAAA, located 11-30 nucleotides 
upstream. Termination and polyadenylation signals that are 
suitable for the present invention include those derived from 

3 0 SV4 0, or a partial genomic copy of a gene already resident on 
the expression vector. 

In addition to the elements already described, the 
expression vector of the present invention may typically 
contain other specialized elements intended to increase the 

35 level of expression of cloned nucleic acids or to facilitate 
the identification of cells that carry the transfected DNA. 
For instance, a number of animal viruses contain DNA sequences 
that promote the extra chromosomal replication of the viral 
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genome in permissive cell types. Plasmids bearing these viral 
replicons are replicated episomally as long as the appropriate 
factors are provided by genes either carried on the plasmid or 
with the genome of the host cell . 

The DNA sequence encoding the engineered GFP or BFP 
protein may typically be linked to a cleavable signal peptide 
sequence to promote secretion of the encoded protein by the 
transformed cell. Such signal peptides would include, among 
others, the signal peptides from tissue plasminogen activator, 
insulin, neuron growth factor, and juvenile hormone esterase 
of Heliothis virescens. Additional elements of the cassette 
may include enhancers and, if genomic DNA is used as the 
structural gene, introns with functional splice donor and 
acceptor sites. 

The vector may or may not comprise a eukaryotic 
replicon. If a eukaryotic replicon is present, then the 
vector is amplifiable in eukaryotic cells using the 
appropriate selectable marker. If the vector does not 
comprise a eukaryotic replicon, no episomal amplification is 
possible. Instead, the transfected DNA integrates into the 
genome of the transfected cell, where the promoter directs 
expression of the desired nucleic acid. 

The vectors usually comprise selectable markers 
which result in nucleic acid amplification such as the sodium, 
potassium ATPase, thymidine kinase, aminoglycoside 
phosphotransferase, hygromycin B phosphotransferase, 
xanthine -guanine phosphoribosyl transferase, CAD (carbamyl 
phosphate synthetase, aspartate transcarbamylase , and 
dihydroorotase) , adenosine deaminase, dihydrofolate reductase, . 
and asparagine synthetase and ouabain selection. 
Alternatively, high yield expression systems not involving 
nucleic acid amplification are also suitable, such as using a 
bacculovirus vector in insect cells, with the engineered GFP 
or BFP encoding sequence under the direction of the polyhedrin 
promoter or other strong baculovirus promoters. 

The expression vectors of the present invention will 
typically contain both prokaryotic sequences that facilitate 
the cloning of the vector in bacteria as well as one or more 
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eukaryotic transcription units that are expressed only in 
eukaryotic cells, such as mammalian cells. The prokaryotic 
sequences are preferably chosen such that they do not 
interfere with the replication of the DNA in eukaryotic cells. 
5 Any of the well known procedures for introducing 

foreign nucleotide sequences into host cells may be used. 
These include the use of calcium phosphate transf ection, 
polybrene, protoplast fusion, electroporation, liposomes, 
microinjection, plasma vectors, viral vectors and any of the 

10 other well known methods for introducing cloned genomic DNA, 
cDNA, synthetic DNA or other foreign nucleic acidic material 
into a host cell (see Sambrook et al . , supra). It is only 
necessary that the particular genetic engineering procedure 
utilized be capable of successfully introducing at least one 

15 nucleic acid into the host cell which is capable of expressing 
the engineered GFP or BFP protein. 

3. Expression in insect cells 

The baculovirus expression vector utilizes the 

20 highly expressed and regulated Autographa calif ornica nuclear 
polyhedrosis virus (AcMNPV) polyhedrin promoter modified for 
the insertion of foreign nucleic acids. Synthesis of 
polyhedrin protein results in the formation of occlusion 
bodies in the infected insect cell. The baculovirus vector 

25 utilizes many of the protein modification, processing, and 

transport systems that occur in higher eukaryotic cells. The 
recombinant eukaryotic proteins expressed using this vector 
have been found in many cases to be, antigenically , 
immunogenically, and functionally similar to their natural 

3 0 counterparts. 

Briefly, a DNA sequence encoding an engineered GFP 
or BFP is inserted into a transfer plasmid vector in the 
proper orientation downstream from the polyhedrin promoter, 
and flanked on both ends with baculovirus sequences. Cultured 

35 insect cells, commonly Spodoptera frugiperda cells, are 

transf ected with a mixture of viral and plasmid DNAs . The 
virus that develop, some of which are recombinant virus that 
result from homologous recombination between the two DNAs, are 
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plated at 100-1000 plaques per plate. The plaques containing 
recombinant virus can be identified visually because of their 
ability to form occlusion bodies or by DNA hybridization. The 
recombinant virus is isolated by plague purification. The 
resulting recombinant virus, capable of expressing engineered 
GFP or BFP, is self -propagating in that no helper virus is 
required for maintenance or replication. After infecting an 
insect culture with recombinant virus, one can expect to find 
recombinant protein within 48-72 hours. The infection is 
essentially lytic within 4-5 days. 

There are a variety of transfer vectors into which 
the engineered GFP or BFP nucleic acid can be inserted. For a 
summary of transfer vectors see Luckow, V.A. and Summers, M.D. 
(1988), Bio/Technology 6:47-55. Preferred is the transfer 
vector pAcUW21 described by Bishop, D.H.L. (1992) in Seminars 
in Virology 3:253-264. 

4, Retroviral Vectors 

Retroviral vectors are particularly useful for 
modifying eukaryotic cells because of the high efficiency with 
which the retroviral vectors transduce target cells and 
integrate into the target cell genome. Additionally, the 
retroviruses harboring the retoviral vector are capable of 
infecting cells from a wide variety of tissues. 

Retroviral vectors are produced by genetically 
manipulating retroviruses. Retroviruses are RNA viruses 
because the viral genome is RNA. Upon infection, this genomic 
RNA is reverse transcribed into a DNA copy which is integrated 
into the chromosomal DNA of transduced cells with a high 
degree of stability and efficiency. The integrated DNA copy 
is referred to as a provirus and is inherited by daughter 
cells as is any other gene. The wild type retroviral genome 
and the proviral DNA have three genes: the gag, the pol and 
the env genes, which are flanked by two long terminal repeat 
(LTR) sequences. The gag gene encodes the internal structural 
(nucleocapsid) proteins; the pol gene encodes the RNA directed 
DNA polymerase (reverse transcriptase) ; and the env gene 
encodes viral envelope glycoproteins. The 5' and 3» LTRs 
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serve to promote transcription and polyadenylation of virion 
RNAs. Adjacent to the 5* LTR are sequences necessary for 
reverse transcription of the genome (the tRNA primer binding 
site) and for efficient encapsulation of viral RNA into 
5 particles (the Psi site). See Mulligan, R.C. (1983), In: 

Experimental Manipulation of Gene Expression, M. Inouye (ed) , 
155-173; Mann, R. et al . (1983), Cell, 33:153-159; Cone, R.D. 
and R.C. Mulligan (1984), Proceedings of the National Academy 
of Sciences, U.S.A. 81:6349-6353. 

10 The design of retroviral vectors is well known to 

one of skill in the art. See Singer, M . and Berg, P. supra. 
In brief, if the sequences necessary for encapsidation (or 
packaging of retroviral RNA into infectious virions) are 
missing from the viral genome, the result is a cis acting 

15 defect which prevents encapsidation of genomic RNA. However, 
the resulting mutant is still capable of directing the 
synthesis of all virion proteins. Retroviral genomes from 
which these sequences have been deleted, as well as cell lines 
containing the mutant genome stably integrated into the 

20 chromosome are well known in the art and are used to construct 
retroviral vectors. Preparation of retroviral vectors and 
their uses are described in many publications including 
European Patent Application EPA 0 178 220, U.S. Patent 
4,405,712, Gilboa (1986), Biotechniques 4 : 504 - 512 , Mann, et 

25 al. (1983), Cell 33:153-159, Cone and Mulligan (1984), Proc . 

Natl. Acad. Sex. USA 81:6349-6353, Eglitis, M.A, et al . (1988) 
Biotechniques 6:608-614, Miller, A.D, et al. (1989) 
Bio techniques 7:981-990, Miller, A.D. (1992) Nature, supra, 
Mulligan, R.C. (1993), supra, and Gould, B. et al . , and 

30 International Patent Application No. WO 92/07943 entitled 

"Retroviral Vectors Useful in Gene Therapy." The teachings of 
these patents and publications are incorporated herein by 
reference . 

The retroviral vector particles are prepared by 
3 5 recombinant ly inserting the nucleic acid encoding engineered 
GFP or BFP into a retrovirus vector and packaging the vector 
with retroviral capsid proteins by use of a packaging cell 
line. The resultant retroviral vector particle is incapable 
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of replication in the host cell and is capable of integrating 
into the host cell genome as a proviral sequence containing 
the engineered GFP or BFP nucleic acid. As a result, the 
patient is capable of producing engineered GFP or BFP and 
metabolize glycogen to completion. 

Packaging cell lines are used to prepare the 
retroviral vector particles. A packaging cell line is a 
genetically constructed mammalian tissue culture cell line 
that produces the necessary viral structural proteins required 
for packaging, but which is incapable of producing infectious 
virions. Retroviral vectors, on the other hand, lack the 
structural genes but have the nucleic acid sequences necessary 
for packaging. To prepare a packaging cell line, an 
infectious clone of a desired retrovirus, in which the 
packaging site has been deleted, is constructed. Cells 
comprising this construct will express all structural proteins 
but the introduced DNA will be incapable of being packaged. 
Alternatively, packaging cell lines can be produced by 
transforming a cell line with one or more expression plasmids 
encoding the appropriate core and envelope proteins. In these 
cells, the gag, pol, and env genes can be derived from the 
same or different retroviruses. 

A number of packaging cell lines suitable for the 
present invention are available in the prior art. Examples of 
these cell lines include Crip, GPE86 , PA317 and PG13 . See 
Miller et al. (1991), J. Virol. 65:2220-2224, which is 
incorporated herein by reference. Examples of other packaging 
cell lines are described in Cone, R. and Mulligan, R.C. 
(1984), Proceedings of the National Academy of Sciences, 
U.S.A., 81:6349-6353 and in Danos, O. and R.C. Mulligan 
(1988), Proceedings of the National Academy of Sciences, 
U.S.A., 85:6460-6464, Eglitis, M.A, et al . (1988) 
Biotechniques 6:608-614, also all incorporated herein by 
reference . 

Packaging cell lines capable of producing retroviral 
vector particles with chimeric envelope proteins may be used. 
Alternatively, amphotropic or xenotropic envelope proteins, 
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such as those produced by PA317 and GPX packaging cell lines 
may be used to package the retroviral vectors. 

Transforming cells with nucleic acids can involve, 
for example, incubating the cells with viral vectors (e.g., 
5 retroviral or adeno-associated viral vectors) containing with 
cells within the host range of the vector. See, e.g., Methods 
in Enzymology, Vol. 185, Academic Press, Inc., San Diego, CA 
(D.V. Goeddel, ed.) (1990) or M. Krieger (1990), Gene Transfer 
and Expression -- A Laboratory Manual, Stockton Press, New 
10 York, NY, and the references cited therein. 

5, Transformation with adeno-associated virus 

Adeno associated viruses (AAVs) require helper 
viruses such as adenovirus or herpes virus to achieve 

15 productive infection. In the absence of helper virus 

functions, AAV integrates (site-specifically) into a host 
cell's genome, but the integrated AAV genome has no pathogenic 
effect. The integration step allows the AAV genome to remain 
genetically intact until the host is exposed to the 

20 appropriate environmental conditions (e.g., a lytic helper 

virus) , whereupon it re-enters the lytic life-cycle. Samulski 
(1993), Current Opinion in Genetic and Development 3:74-80 and 
the references cited therein provides an overview of the AAV 
life cycle. 

25 AAV- based vectors are used to transduce cells with 

target nucleic acids, e.g., in the in vitro production of 
nucleic acids and peptides, and in in vivo and ex vivo gene 
therapy procedures. See, West et al . (1987), Virology 160:38- 
47; Carter et al. (1989) U.S. Patent No. 4,797,368; Carter et 

30 al. (1993), WO 93/24641; Kotin (1994), Human Gene Therapy 
5:793-801; Muzyczka (1994), J". Clin. Invest. 94:1351 and 
Samulski (supra) for an overview of AAV vectors. 

Recombinant AAV vectors (rAAV vectors) deliver 
foreign nucleic acids to a wide range of mammalian cells 

35 (Hermonat & Muzycka (1984), Proc. Natl. Acad, Sci. USA 
81:6466-6470; Tratschin et al . (1985), Mol . Cell Biol. 
5:3251-3260), integrate into the host chromosome (Mclaughlin 
et al. (1988), J. Virol. 62:1963-1973), and show stable 
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expression of the transgene in cell and animal models {Flotte 
et al. (1993), Proc. Natl. Acad. Sci . USA 90:10613-10617). 
Moreover, unlike some retroviral vectors, rAAV vectors are 
able to infect non-dividing cells (PodsaJcoff et al. (1994), j. 
Virol. 68:5656-66; Flotte et al . (1994), Am. J. Respir. Cell 
Mol. Biol. 11:517-521). Further advantages of rAAV vectors 
include the lack of an intrinsic strong promoter, thus 
avoiding possible activation of downstream cellular sequences, 
and their naked eicosahedral capsid structure, which renders 
them stable and easy to concentrate by common laboratory 
techniques. rAAV vectors are used to inhibit, e.g., viral 
infection, by including anti-viral transcription cassettes in 
the rAAV vector which comprise an inhibitor of the invention. 

6. Expreasion in recombinant vaccinia virus- 
infected cells 

The nucleic acid encoding engineered GFP or BFP is 
inserted into a plasmid designed for producing recombinant 
vaccinia, such as pGS62, Langford, C.L. et al . (1986), Mol. 
Cell. Biol. 6:3191-3199. This plasmid consists of a cloning 
site for insertion of foreign nucleic acids, the P7.5 promoter 
of vaccinia to direct synthesis of the inserted nucleic acid, 
and the vaccinia TK gene flanking both ends of the foreign 
nucleic acid. 

When the plasmid containing the engineered GFP or 
BFP nucleic acid is constructed, the nucleic acid can be 
transferred to vaccinia virus by homologous recombination in 
the infected cell. To achieve this, suitable recipient cells 
are trans fected with the recombinant plasmid by standard 
calcium phosphate precipitation techniques into cells already 
infected with the desirable strain of vaccinia virus, such as 
Wyeth, Lister, WR or Copenhagen. Homologous recombination 
occurs between the TK gene in the virus and the flanking TK 
gene sequences in the plasmid. This results in a recombinant 
virus with the foreign nucleic acid inserted into the viral TK 
gene, thus rendering the TK gene inactive. Cells containing 
recombinant viruses are selected by adding medium containing 
5-bromodeoxyuridine, which is lethal for cells expressing a TK 
gene . 
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Confirmation of production of recombinant virus is 
achieved by DNA hybridization using cDNA encoding the 
engineered GFP or BFP and by immunodetection techniques using 
antibodies specific for the expressed protein. Virus stocks 
5 may be prepared by infection of cells such as He LA S3 spinner 
cells and harvesting of virus progeny. 

7. Expression in cell cultures 

GFP- or BFP-encoding nucleic acids can be ligated to 

10 various expression vectors for use in transforming host cell 
cultures. The culture of cells used in conjunction with the 
present invention is well known in the art. Freshney (1994) 
(Culture of Animal Cells, a Manual of Basic Technique, third 
edition Wiley-Liss, New York), Kuchler et al . (1977) 

15 Biochemical Methods in Cell Culture and Virology, Kuchler, 
R.J., Dowden, Hutchinson and Ross, Inc., and the references 
cited therein provides a general guide to the culture of 
cells. Illustrative cell cultures useful for the production 
of recombinant proteins include cells of insect or mammalian 

20 origin. Mammalian cell systems often will be in the form of 
monolayers of cells, although mammalian cell suspensions are 
also used. Illustrative examples of mammalian cell lines 
include monocytes, lymphocytes, macrophage, VERO and HeLa 
cells, Chinese hamster ovary (CHO) cell lines, W138, BHK, 

25 Cos-7 or MDCK cell lines (see, e.g., Freshney, supra). 

Cells of mammalian origin are illustrative of cell 
cultures useful for the production of the engineered GFP or 
BFP . Mammalian cell systems often will be in the form of 
monolayers of cells although mammalian cell suspensions may 

30 also be used. Illustrative examples of mammalian cell lines 

include VERO and HeLa cells, Chinese hamster ovary (CHO) cell 
lines, WI38, BHK, C0S-7 or MDCK cell lines. 

As indicated above, the vector, e.g., a plasmid, 
which is used to transform the host cell, preferably contains 

35 DNA sequences to initiate transcription and sequences to 

control the translation of the engineered GFP or BFP nucleic 
acid sequence. These sequences are referred to as expression 
control sequences. Illustrative expression control sequences 



WO 97/42320 



PCT/US97/07625 



29 



are obtained from the SV-40 promoter (Science 222:524-527, 
(1983)), the CMV i.e. Promoter (Proc. Natl. Acad. Sci. 
81:659-663, (1984)) or the metallothionein promoter (Nature 
296:39-42, (1982)). The cloning vector containing the 
expression control sequences is cleaved using restriction 
enzymes and adjusted in size as necessary or desirable and 
ligated with sequences encoding the engineered GFP or BFP 
protein by means well known in the art. 

The vectors for transforming cells in culture 
typically contain gene sequences to initiate transcription and 
translation of the engineered GFP or BFP gene. These 
sequences need to be compatible with the selected host cell. 
In addition, the vectors preferably contain a marker to 
provide a phenotypic trait for selection of transformed host 
cells such as dihydrofolate reductase or metallothionein. 
Additionally, a vector might contain a replicative origin. 

As mentioned above, when higher animal host cells 
are employed, polyadenlyation or transcription terminator 
sequences from known mammalian genes need to be incorporated 
into the vector. An example of a terminator sequence is the 
polyadenylation sequence from the bovine growth hormone gene. 
Sequences for accurate splicing of the transcript may also be 
included. An example of a splicing sequence is the VPl intron 
from SV40 (Sprague, J. et al . (1983), J. Virol. 45: 773-781). 

Additionally gene sequences to control replication 
in the host cell may be incorporated into the vector such as 
those found in bovine papilloma virus type-vectors. 
Saveria-Campo, M . (1985), "Bovine Papilloma virus DNA a 
Eukaryotic Cloning Vector" in DNA Cloning Vol.11 a Practical 
Approach Ed. D.M. Glover, IRL Press, Arlington, Virginia pp. 
213-238. 

The transformed cells are cultured by means well 
known in the art. For example, as published in Kuchler, R.J. 
et al., (1977), Biochemical Methods in Cell Culture and 
Virology. 

In addition to the above general procedures which 
can be used for preparing recombinant DNA molecules and 
transformed unicellular organisms in accordance with the 
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practices of this invention, other known techniques and 
modifications thereof can be used in carrying out the practice 
of the invention. Any known system for expression of isolated 
genes is suitable for use in the present invention. For 
5 example, viral expression systems such as the bacculovirus 
expression system are specifically contemplated within the 
scope of the invention. Many recent U.S. patents disclose 
plasmids, genetically engineering microorganisms, and methods 
of conducting genetic engineering which can be used in the 

10 practice of the present invention. For example, U.S. Pat. No. 
4,273,875 discloses a plasmid and a process of isolating the 
same. U.S. Pat. No. 4,304,863 discloses a process for 
producing bacteria by genetic engineering in which a hybrid 
plasmid is constructed and used to transform a bacterial host. 

15 U.S. Pat.- No. 4,419,450 discloses a plasmid useful as a 
cloning vehicle in recombinant DNA work. U.S. Pat. No. 
4 , 362, 867 discloses recombinant cDNA construction methods and 
hybrid nucleotides produced thereby which are useful in 
cloning processes. U.S. Pat. No. 4,403,036 discloses genetic 

20 reagents for generating plasmids containing multiple copies of 
DNA segments. U.S. Pat. No. 4,363,877 discloses recombinant 
DNA transfer vectors. U.S. Pat. No. 4,356,270 discloses a 
recombinant DNA cloning vehicle and is a particularly useful 
disclosure for those with limited experience in the area of 

25 genetic engineering since it defines many of the terms used in 
genetic engineering and the basic processes used therein. 
U.S. Pat. No. 4,336,336 discloses a fused gene and a method of 
making the same. U.S. Pat. No. 4,319,629 discloses plasmid 
vectors and the production and use thereof. U.S. Pat. No. 

30 4,332,901 discloses a cloning vector useful in recombinant 
DNA. Although some of these patents are directed to the 
production of a particular gene product that is not within the 
scope of the present invention, the procedures described 
therein can easily be modified to the practice of the 

35 invention described in this specification by those skilled in 
the art of genetic engineering. Transferring the isolated GFP 
cDNA to other expression vectors will produce constructs which 
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improve the expression of the GFP polypeptide in E. coli or 
express GFP in other hosts. 

111 • Detection of GFP and BFP Nucl e ic Aeidg and Promina 

A. General detection methods 

The nucleic acids and proteins of the invention are 
detected, confirmed and quantified by any of a number of means 
well known to those of skill in the art. The unique quality 
of the inventive expressed proteins here is that they provide 
an enhanced fluorescence which can be readily and easily 
observed. Fluorescence assays for the expressed proteins are 
described in detail below. other general methods for 
detecting both nucleic acids and corresponding proteins 
include analytic biochemical methods such as 
spectrophotometry, radiography, electrophoresis, capillary 
electrophoresis, high performance liquid chromatography 
(HPLC) , thin layer chromatography (TLC) , hyperdif fusion 
chromatography, and the like, and various immunological 
methods such as fluid or gel precipitin reactions, 
immunodiffusion (single or double), Immunoelectrophoresis, 
radioimmunoassays (RIAs) , enzyme-linked immunosorbent assays 
(ELISAs) , immunofluorescent assays, and the like. The 
detection of nucleic acids proceeds by well known methods such 
as Southern analysis, northern analysis, gel electrophoresis, 
PCR, radiolabeling, scintillation counting, and affinity 
chromatography . 

A variety of methods of specific DNA and RNA 
measurement using nucleic acid hybridization techniques are 
known to those of skill in the art. For example, one method 
for evaluating the presence or absence of engineered GFP or 
BFP DNA in a sample involves a Southern transfer. Southern et 
al. (1975), J. Mol. Biol. 98:503. Briefly, the digested 
genomic DNA is run on agarose slab gels in buffer and 
transferred to membranes. Hybridization is carried out using 
the probes discussed above. Visualization of the hybridized 
portions allows the qualitative determination of the presence 
or absence of engineered GFP or BFP genes. 
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Similarly, a Northern transfer may be used for the 
detection of engineered GFP or BFP mRNA in samples of RNA from 
cells expressing the engineered GFP or BFP gene. In brief, 
the mRNA is isolated from a given cell sample using an acid 
5 guanidinium-phenol- chloroform extraction method. The mRNA is 
then electrophoresed to separate the mRNA species and the mRNA 
is transferred from the gel to a nitrocellulose membrane. As 
with the Southern blots, labeled probes are used to identify 
the presence or absence of the engineered GFP or BFP 

10 transcript. 

The selection of a nucleic acid hybridization format 
is not critical. A variety of nucleic acid hybridization 
formats are known to those skilled in the art. For example, 
common formats include sandwich assays and competition or 

15 displacement assays. Hybridization techniques are generally 
described in "Nucleic Acid Hybridization, A Practical 
Approach, " Ed. Hames, B.D. and Higgins, S.J., IRL Press, 1985; 
Gall and Pardue (1969), Proc. Natl. Acad. Sci. USA 63:378-383; 
and John, Burnsteil and Jones (1969), Nature 223:582-587. 

20 For example, sandwich assays are commercially useful 

hybridization assays for detecting or isolating nucleic acid 
sequences. Such assays utilize a "capture" nucleic acid 
covalently immobilized to a solid support and labelled 
"signal" nucleic acid in solution. The clinical sample will 

25 provide the target nucleic acid. The "capture" nucleic acid 
and "signal" nucleic acid probe hybridize with the target 
nucleic acid to form a "sandwich" hybridization complex. To 
be effective, the signal nucleic acid cannot hybridize with 
the capture nucleic acid. 

30 The nucleic acid sequences used in this invention 

can be either positive or negative probes. Positive probes 
bind to their targets and the presence of duplex formation is 
evidence of the presence of the target. Negative probes fail 
to bind to the suspect target and the absence of duplex 

35 formation is evidence of the presence of the target. For 

example, the use of a wild type specific nucleic acid probe or 
PCR primers may act as a negative probe in an assay sample 
where only the mutant engineered GFP or BFP is present . 



WO 97/42320 



PCT/US97/07625 



33 

Labelled signal nucleic acids, whether those 
described herein or others known in the art are used to detect 
hybridization. Complementary nucleic acids or signal nucleic 
acids may be labelled by any one of several methods typically 
used to detect the presence of hybridized polynucleotides. 
One common method of detection is the use of autoradiography 
with 3 H, 125 I, 35 s, 14 C, or 32 P-labelled probes or the like. 
Other labels include ligands which bind to labelled 
antibodies, f luorophores, chemi luminescent agents, enzymes, 
and antibodies which can serve as specific binding pair 
members for a labelled ligand. 

Detection of a hybridization complex may require the 
binding of a signal generating complex to a duplex of target 
and probe polynucleotides or nucleic acids. Typically, such 
binding occurs through ligand and anti- ligand interactions as 
between a ligand -conjugated probe and an anti -ligand 
conjugated with a signal. The binding of the signal 
generation complex is also readily amenable to accelerations 
by exposure to ultrasonic energy. 

The label may also allow indirect detection of the 
hybridization complex. For example, where the label is a 
hapten or antigen, the sample can be detected by using 
antibodies. In these systems, a signal is generated by 
attaching fluorescent or enzyme molecules to the antibodies or 
in some cases, by attachment to a radioactive label. 
(Tijssen, P. (1985), "Practice and Theory of Enzyme 
Immunoassays," Laboratory Techniques in Biochemistry and 
Molecular Biology, Burdon, R.H., van Knippenberg, P.H., Eds., 
Elsevier, pp. 9-20.) 

The sensitivity of the hybridization assays may be 
enhanced through use of a nucleic acid amplification system 
which multiplies the target nucleic acid being detected. In 
vitro amplification techniques suitable for amplifying 
sequences for use as molecular probes or for generating 
nucleic acid fragments for subsequent subcloning are known. 
Examples of techniques sufficient to direct persons of skill 
through such in vitro amplification methods, including the 
polymerase chain reaction (PCR) the ligase chain reaction 



WO 97/42320 



PCT/US97/07625 



34 

(LCR) , Q/3-replicase amplification and other RNA polymerase 
mediated techniques {e.g., NASBA) are found in Berger, 
Sambrook, and Ausubel, as well as Mullis et al . (1987), U.S. 
Patent No. 4,683,202; PCR Protocols A Guide to Methods and 
5 Applications (Innis et al . , eds) Academic Press Inc. San 
Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 
1990), Chem. Eng. News 36-47; J. NIH Res. (1991) 3:81-94; 
(Kwoh et al. (1989), Proc. Natl. Acad. Sci. USA 86:1173; 
Guatelli et al. (1990), Proc. Natl. Acad. Sci. USA 87:1874; 

10 Lomell et al . (1989), J. Clin. Chem. 35:1826; Landegren et al . 

(1988), Science 241:1077-1080; Van Brunt (1990), Biotechnology 
8:291-294; Wu and Wallace (1989), Gene 4:560; Barringer et al . 
(1990), Gene 89:117, and Sooknanan and Malek (1995), 
Biotechnology 13:563-564. Improved methods of cloning 

15 in vitro amplified nucleic acids are described in Wallace et 
al., U.S. Pat. No. 5,426,039. Other methods recently 
described in the art are the nucleic acid sequence based 
amplification (NASBA™, Cangene, Mississauga, Ontario) and Q 
Beta Replicase systems. These systems can be used to directly 

20 identify mutants where the PCR or LCR primers are designed to 
be extended or ligated only when a select sequence is present. 
Alternatively, the select sequences can be generally amplified 
using, for example, nonspecific PCR primers and the amplified 
target region later probed for a specific sequence indicative 

25 of a mutation. 

Oligonucleotides for use as probes, e.g., in in 
vitro amplification methods, for use as gene probes, or as 
inhibitor components are typically synthesized chemically 
according to the solid phase phosphoramidite triester method 

30 described by Beaucage and Caruthers (1981) , Tetrahedron Letts. 
22 (20) : 1859-1862 , e.g., using an automated synthesizer, as 
described in Needham-VanDevanter et al. (1984), Nucleic Acids 
Res. 12:6159-6168. Purification of oligonucleotides, where 
necessary, is typically performed by either native acrylamide 

35 gel electrophoresis or by anion- exchange HPLC as described in 
Pearson and Regnier (1983), J". Chrom. 255:137-149. The 
sequence of the synthetic oligonucleotides can be verified 
using the chemical degradation method of Maxam and Gilbert 
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(1980) in Grossman and Moldave (eds.) Academic Press, New 
York, Methods in Enzymology 65:499-560. 

An alternative means for determining the level of 
expression of the engineered GFP or BFP gene is in situ 
hybridization. In situ hybridization assays are well known 
and are generally described in Angerer et al. (1987), Methods 
Enzymol. 152:649-660. In an in situ hybridization assay cells 
are fixed to a solid support, typically a glass slide. If DNA 
is to be probed, the cells are denatured with heat or alkali. 
The cells are then contacted with a hybridization solution at 
a moderate temperature to permit annealing of engineered GFP 
or BFP specific probes that are labelled. The probes are 
preferably labelled with radioisotopes or fluorescent 
reporters . 



B. Fluorescence Assay 

When a fluorophore such as protein that is capable 
of fluorescing is exposed to a light of appropriate 
wavelength, it will absorb and store light and then release 
the stored light energy. The range of wavelengths that a 
fluorophore is capable of absorbing is the excitation spectrum 
and the range of wavelengths of light that a fluorophore is 
capable of emitting is the emission or fluorescence spectrum. 
The excitation and fluorescence spectra for a given 
fluorophore usually differ and may be readily measured using 
known instruments and methods. For example, scintillation 
counters and photometers (e.g. luminometers) , photographic 
film, and solid state devices such as charge coupled devices, 
may be used to detect and measure the emission of light. 

The nucleic acids, vectors, mutant proteins provided 
herein, in combination with well known techniques for over- 
expressing recombinant proteins, make it possible to obtain 
unlimited supplies of homogeneous mutant GFPs and BFPs . These 
modified GFPs or BFPs having increased fluorescent activity 
replace wtGTP or other currently employed tracers in existing 
diagnostic and assay systems. Such currently employed tracers 
include radioactive atoms or molecules and color-producing 
enzymes such as horseradish peroxidase. 
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The benefits of using the mutants of the present 
invention are at least four- fold: the modified GFPs and BFPs 
are safer than radioactive-based assays, modified GFPs and 
BFPs can be assayed quickly and easily, and large numbers of 
5 samples can be handled simultaneously, reducing overall 

handling and increasing efficiency. Of great significance, 
the expression and subcellular distribution of the fluorescent 
proteins within cells can be detected in living tissues 
without any other experimental manipulation than to placing 
10 the cells on a slide and viewing them through a fluorescence 
microscope . This represents a vast improvement over methods 
of immunodetection that require fixation and subsequent 
labelling. 

The modified GFPs and BFPs of the present invention 

15 can be used in standard assays involving a fluorescent marker. 

For example, ligand- ligator binding pairs that can be modified 
with the mutants of the present invention without disrupting 
the ability of each to bind to the other can form the basis of 
an assay encompassed by the present invention. These and 

20 other assays are known in the art and their use with the GFPs 
and BFPs of the present invention will become obvious to one 
skilled in the art in light of the teachings disclosed herein. 
Examples of such assays include competitive assays wherein 
labeled and unlabeled ligands competitively bind to a ligator, 

25 noncompetitive assay where a ligand is captured by a ligator 
and either measured directly or "sandwiched" with a secondary 
ligator that is labeled. Still other types of assays include 
immunoassays, single -step homogeneous assays, multiple- step 
heterogeneous assays, and enzyme assays. 

30 In a number of embodiments, the mutant GFPs and BFPs 

are combined with fluorescent microscopy using known 
techniques (see, e.g., Stauber et al . , Virol. 213:439-454 
(1995)) or preferably with fluorescence activated cell sorting 
(FACS) to detect and optionally purify or clone cells that 

35 express specific recombinant constructs. For a brief overview 
of the FACS and its uses, see: Herzenberg et al., 1976, 
"Fluorescence activated cell sorting", Sci . Amer. 234, 108; 
see also Flow Cytometry and Sorting, eds. Melamad, Mullaney and 
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Mendelsohn, John Wiley and Sons, Inc., New York, 1979). 
Briefly, fluorescence activated cell sorters take a suspension 
of cells and pass them single file into the light path of a 
laser placed near a detector. The laser usually has a set 
wavelength. The detector measures the fluorescent emission 
intensity of each cell as it passes through the instrument and 
generates a histogram plot of cell number versus fluorescent 
intensity. Gates or limits can be placed on the histogram 
thus identifying a particular population of cells. In one 
embodiment, the cell sorter is set up to select cells having 
the highest probe intensity, usually a small fraction of the 
cells in the culture, and to separate these selected cells 
away from all the other cells. The level of intensity at 
which the sorter is set and the fraction of cells which is 
selected, depend on the condition of the parent culture and 
the criteria of the isolation. In general, the operator 
should first sort an aliquot of the culture, and record the 
histogram of intensity versus number of cells. The operator 
can then set the selection level and isolate an appropriate 
number of the most active cells. Currently, fluorescence 
activated cell sorters are equipped with automated cell 
cloning devices. Such a device enables one to instruct the 
instrument to singly deposit a selected cell into an 
individual growth well, where it is allowed to grow into a 
monoclonal culture. Thus, genetic homogeneity is established 
within the newly cloned culture. 

General Applicat ions for the GFP Mutants 

It should be self-evident that the mutant GFP and 
BFP sequences described here have unlimited uses, particularly 
as signal or reporter sequences for the co-expression of other 
nucleic acid sequences of interest and/or to track the 
location and/or movement of other sequences within the cell, 
within tissue and the like. For example, these reporter type 
sequences could be used to track the spread (or lack thereof) 
of a disease causal agent in drug screening assays or could 
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readily be used in diagnostics. Some of the more interesting 
applications are described below. 

A. Protein Trafficking 

5 Normally, expressed mutant GFPs and BFPs are 

distributed throughout the cell (particularly mammalian 
cells), except for the nucleolus. However, as described 
below, when a GFP mutant is fused to the HIV-1 Rev protein, a 
hybrid molecule results which retains the Rev function and is 
10 localized mainly in the nucleolus where Rev is found. Fusion 
to the N- terminal domain of the HIV-l Nef protein produces a 
hybrid protein detectable in the plasma membrane. Thus, the 
GFP mutants can be used to monitor the subcellular targeting 
and transport of proteins to which they are fused. 

15 

B. Gene Therapy 

The mutant GFPs described here have interesting and 
useful applications in gene therapy. Gene therapy in general 
is the correction of genetic defects by insertion of exogenous 

20 cellular genes that encode a desired function into cells that 
lack that function, such that the expression of the exogenous 
gene a) corrects a genetic defect or b) causes the destruction 
of cells that are genetically defective. Methods of gene 
therapy are well known in the art, see, for example, Lu, M . , 

25 et al. (1994), Human Gene Therapy 5:203; Smith, C. (1992), «J. 

Hematotherapy 1:155; Cassel, A., et al . (1993), Exp, Hematol . 
21-:585 (1993); Larrick, J.W. and Burck, K.L., Gene Therapy: 
Application of Molecular Biology, Elsevier Science Publishing Co., 
Inc., New York, New York (1991) and Kreigler, M. Gene Transfer 

30 and Expression: A Laboratory Manual, W.H. Freeman and Company, New 
York (1990), each incorporated herein by reference. One 
modality of gene therapy involves (a) obtaining from a patient 
a viable sample of primary cells of a particular cell type; 
(b) inserting into these primary cells a nucleic acid segment 

35 encoding a desired gene product; (c) identifying and isolating 
cells and cell lines that express the gene product; (d) re- 
introducing cells that express the gene product; (e) removing 
from the patient an aliquot of tissue including cells 
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resulting from seep c and their progeny; and (f) determining 
the quantity of the cells resulting from step c and their 
progeny, in said aliquot. The introduction into cells in step 
c of a polycistronic vector that encodes GFP or BFP in 
addition to the desired gene allows for the quick 
identification of viable cells that contain and express the 
desired gene. 

Another gene therapy modality involves inserting the 
desired nucleic acid into selected tissue cells In situ, for 
example into cancerous or diseased cells, by contacting the 
target cells in situ with retroviral vectors that encode the 
gene product in question. Here, it is important to quickly 
and reliably assess which and what proportion of cells have 
been transfected. Co-expression of GFP and BFP permits a 
quick assessment of proportion of cells that are transfected, 
and levels of expression. 

C. Diagnostics 

One potential application of the GFP/BFP variants is 
in diagnostic testing. The GFP/BFP gene, when placed under 
the control of promoters induced by various agents, can serve 
as an indicator for these agents. Established cell lines or 
cells and tissues from transgenic animals carrying GFP/BFP 
expressed under the desired promoter will become fluorescent 
in the presence of the inducing agent . 

Viral promoters which are transact ivated by the 
corresponding virus, promoters of heat shock genes which are 
induced by various cellular stresses as well as promoters 
which are sensitive to organismal responses, e.g. 
inflammation, can be used in combination with the described 
GFP/BFP mutants in diagnostics. 

In addition, the effect of selected culture 
conditions and components (salt concentrations, pH, 
temperature, trans -acting regulatory substances, hormones, 
cell -cell contacts, ligands of cell surface and internal 
receptors) can be assessed by incubating cells in which 
sequences encoding the fluorescent proteins provided herein 
are operably linked to nucleic acids (especially regulatory 
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elements such as promoters) derived from a selected gene, and 
detecting the expression and location of fluoresence. 

D. Toxicology 

Another application of the GFP/BFP-based 
methodologies is in the area of toxicology. Assessment of the 
mutagenic potential of any compound is a prerequisite for its 
use. Until recently, the Ames assay in Salmonella and tests 
based on chromosomal aberrations or sister chromatid exchanges 
in cultured mammalian cells were the main tools in toxicology. 
However, both assays are of limited sensitivity and 
specificity and do not allow studies on mutation induction in 
various organs or tissues of the intact organism. 

The introduction of transgenic mice with a 
mutational target in a shuttle vector has made possible the 
detection of induced mutations in different tissues in vivo. 
The assay involves DNA isolation from tissues of exposed mice, 
packaging of the target DNA into bacteriophage lambda 
particles and subsequent infection of E. coli. The mutational 
target in this assay is either the lacZ or lad genes and 
quantitation of blue vs white plaques on the bacterial lawn 
allows for mutagenic assessment. 

GFP/BFP could significantly simplify both the tissue 
culture and transgenic mouse procedures. Expression of 
GFP/BFP under the control of a repressor, which in turn is 
driven by the promoter of a constitutively expressed gene, 
will establish a rapid method for evaluating the mutagenic 
potential of an agent. The presence of fluorescent cells, 
following exposure of a cell line, tissue or whole animal 
carrying the GFP/BFP-based detection construct, will reflect 
the mutagenicity of the compound in question. GFP/BFP 
expressed under the control of the target DNA, the repressor 
gene, will only be synthesized when the repressor is 
inactivated or turned off or the repressor recognition 
sequences are mutated. Direct visualization of the detector 
cell line or tissue biopsy can qualitatively assess the 
mutagenicity of the agent, while FACS of the dissociated cells 
can provide for quantitative analysis. 
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E. Drug Screening 

The GFP/BFP detection system could also 
significantly expedite and reduce the cost of some current 
drug screening procedures- A dual color screening system 
(DCSS), in which GFP is placed under the promoter of a target 
gene and BFP is expressed from a constitutive promoter, could 
provide for rapid analysis of agents that specifically affect 
the target gene. Established cell lines with the DCSS could 
be screened with hundreds of compounds in few hours. The 
desired drug will only influence the expression of GFP. 
Non-specific or cytotoxic effects will be detected by the 
second marker, BFP. The advantages of this system are that no 
exogenous substances are required for GFP and BFP detection, 
the assay can be used with single cells, cell populations, or 
cell extracts, and that the same detection technology and 
instrumentation is used for very rapid and non-destructive 
detection. 

The search for antiviral agents which specifically 
block viral transcription without affecting cellular 
transcription, could be significantly improved by the DCSS. 
In the case of HIV, appropriate cell lines expressing GFP 
under the HIV LTR and BFP under a cellular constitutive 
promoter, could identify compounds which selectively inhibit 
HIV transcription. Reduction of only the green but not the 
blue fluorescent signal will indicate drug specificity for the 
HIV promoter. Similar approaches could also be designed for 
other viruses . 

Furthermore, the search for antiparasitic agents 
could also be helped by the DCSS . Established cell lines or 
transgenic nematodes or even parasitic extracts where 
expression of GFP depends on parasite-specific trans splicing 
sequences while BFP is under the control of host -specific cis 
splicing elements, could provide for rapid screen of selective 
antiparasitic drugs. 

The invention will be more readily understood by 
reference to the following specific examples which are 
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included for purposes of illustration only and are not 
intended to limit the invention unless so stated. 



EXAMPLES 



The following general protocol was used to generate 
mutant GFP- or BFP-encoding nucleic acids, transform host 
cells, and express the mutant GFP and BFP proteins: 

• Clone a nucleic acid that encodes either wtGFP or 
BFP (Tyr 67 -»His) , under the control of eukaryotic or 
prokaryotic promoters, into a standard ds-DNA plasmid 

• Convert the plasmid vector to a ss-DNA by standard 
methods 

• Anneal the ss-DNA to 40-50 nucleotide DNA oligomers 
having base mismatches at the site(s) intended to be 
engineered 

• Convert the ss-DNA to a closed ds-DNA plasmid vector by 
use of DNA polymerase and standard protocols 

• Identify plasmids containing the desired mutations by 
restriction analysis following plasmid DNA isolation from 
E. coli strains transformed with the mutagenized DNA 

• verify the presence of mutations by DNA sequencing 

• trans feet human transformed embryonic kidney 293 cells 
with equal amounts of DNA from the appropriate plasmids 

• compare the fluorescence intensity of the signals 

Nucleic acids and vectors 

The wtGFP cDNA (SEQ ID NO:l) was obtained from Dr. 
Chalfie of Columbia University. All mutants described were 
obtained by modifying this wtGFP sequence as detailed below. 

The vectors used to clone and to express the GFPs 
and BFPs are derivatives of the commercially available 
plasmids pcDNA3 (Invitrogen, San Diego, CA) , pBSSK+ 
(Stratagene, La Jolla, CA) and pETlla (Novagen, Madison, WI) . 
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wtGFP protein ex pression in mammalian cells 

Several vectors for the expression of GFP in 
mammalian cells were constructed: 

p FRED 4 carries the wtGFP sequences under the control of the 
cytomegalovirus (CMV) early promoter and the polyadenylation 
signal of the Human Immunodeficiency Virus-1 (HIV) 3' Long 
Terminal Repeat (LTR) . To derive pFRED4 we amplified the GFP 
coding sequence from plasmid #TU58 (Chalfie et ai., 1994) by 
the polymerase chain reaction (PCR) . For PCR amplification of 
the GFP coding region, oligonucleotides #16417 and #16418 were 
used as primers. Oligonucleotide #16417: 
5 • -GGAGGCGCGCAAGAAATGGCTAGCAAAGGAGAAGA-3 1 (SEQ ID NO: 3), 
containing the BssHII recognition sequence and the translation 
initiation sequence of the HIV-l Tat protein, was the sense 
primer. The antisense primer, #16418: 

5 1 -GCGGGATCCTTATTTGTATAGTTCATCCATGCCATG- 3 1 ( SEQ ID NO : 4 ) 
contained the BamHI recognition sequence. The amplified 
fragment was digested with BssHII and BamHI and cloned into 
BssHII and BamHI digested pCMV37Ml-10D, a plasmid containing 
the CMV early promoter and the HIV-l p37gag region, followed 
by several cloning sites and the HIV-l 3' LTR. Thus the 
p37gag gene was replaced by GFP, resulting in pFRED4 . 

In a second step, the I485bp fragment from pFRED4 , 
generated from StuI and BamHI double digestion, was subcloned 
into the 4 747bp vector derived from the Nrul and BamHI double 
digestion of pcDNA3 . The resulting plasmid, pFRED7 (SEQ ID 
NO: 5), expresses GFP under the control of the early CMV 
promoter and the bovine growth hormone polyadenylation signal. 

Bacterial expression 

For bacterial expression, we constructed plasmid 
pBSGFP (SEQ ID N0:6), a pBSSK+ derivative carrying wtGFP. 
pBSGFP was generated by inserting the GFP containing region of 
pFRED4, digested with BamHII and BamHI and subsequently 
treated with Klenow, into the EcoRV digested pBSSK+ vector. 
In pBSGFP the wtGFP is fused downstream to the 4 3 amino acids 
of the alpha peptide of beta galactosidase , present in the 
pBSSK+ polylinker region. The added amino acids at the 
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N-terminus of wtGFP have no apparent effect on the GFP signal, 
as judged from subsequent plasmids containing precise 
deletions of the extra amino acids. 

For GFP overexpression and purification we generated 
plasmid pFRED13 (SEQ ID NO; 7) by ligating the 717bp fragment 
from pFRED7 digested with Nhel and BamHI, to the 5644bp 
fragment resulting from the Nhel and BamHI double digestion of 
pETlla. In pFRED13, GFP is synthesized under the control of 
the bacteriophage T7 philO promoter. 

The oligonucleotides used for GFP mutagenesis were 
synthesized by the DNA Support Services of the ABL Basic 
Research Program of the National Cancer Institute. DNA 
sequencing was performed by the PCR-assisted fluorescent 
terminator method (ReadyReaction DyeDeoxy Terminator Cycle 
Sequencing Kit, ABI , Columbia, MD) according to the 
manufacturer's instructions. Sequencing reactions were 
resolved on the ABI Model 3 73A DNA Sequencing System. 
Sequencing data were analyzed using the Sequencher program 
(Gene Codes, Ann Arbor, MI) . 

Enzymes were purchased from New England Biolabs 
(Beverly, MA) and used according to conditions described by 
the supplier. Chemicals used for the purification of wild 
type and mutant proteins were purchased from SIGMA (St. Louis, 
MO) . Tissue culture media were obtained from Biofluids 
(Rockville, MD) and GIBCO/BRL (Gaithersburg , MD) . Competent 
bacterial cells were purchased from GIBCO/BRL. 

Preparation of mutants 

Initially, plasmid pBSGFP was used to mutagenize the 
GFP coding sequence by single-stranded DNA site directed 
mutagenesis, as described by Schwartz et al . (1992) J. Virol. 
66:7176, In addition to changing specific codons, our 
strategy was also to improve GFP expression by replacing 
potential inhibitory nucleotide sequences without altering the 
GFP amino acid sequence. This approach has been successfully 
employed in the past for other proteins (Schwartz et al . 
(1992) J. Virol. 66:7176). 
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For the pBSGFP mutagenesis the following 
oligonucleotides were used: 
#17422 (SEQ ID NO:8) : 

5 • - CAATTTGTGTCCCAGAATGTTGCCATCTTCCTTGAAGTCAATACCTTT- 3 « 
#17423 (SEQ ID NO: 9) : 

5 1 - GTCTTGTAGTTGCCGTCATCTTTGAAGAAGATGCTCCTTTCCTGTAC - 3 ' 
#17424 (SEQ ID NO: 10) : 

5 ' -CATGGAACAGGCAGTTTGCCAGTAGTGCAGATGAACTTCAGGGTAAGTTTTC - 3 • 
#17425 (SEQ ID NO: 11) : 

5 ' - CTCCACTGACAGAGAACTTGTGGCCGTTAACATCACCATC - 3 1 
#17426 (SEQ ID NO: 12) : 

5 ' - CCATCTTCAATGTTGTGGCGGGTCTTGAAGTTCACTTTGATTCCATT- 3 * 
#17465 (SEQ ID NO: 13) : 

5 ' -CGATAAGCTTGAGGATCCTCAGTTGTACAGTTCATCCATGC-3 ' 

Oligonucleotide #17426 introduces a mutation in GFP, 
converting the Isoleucine (lie) at position 168 into Threonine 
(Thr) . The llel68Thr change has been shown to alter the GFP 
spectrum and to also increase the intensity of GFP 
fluorescence by almost two-fold at the emission maxima (Heim 
et al. (1994) , supra) . 

The mutagenesis mixture was used to transform DH5a 
competent E . coli cells. Ampicilin resistant colonies were 
obtained and examined for their fluorescent properties by 
excitation with UV light. One colony, significantly brighter 
than the rest, was apparent on the agar plate. This colony 
was further purified, the plasmid DNA was isolated and used to 
transform DH5a competent bacteria. This time all the colonies 
were bright green when excited with the UV light, indicating 
that the bright green fluorescence was associated with the 
presence of the plasmid. The sequence of the GFP segment 
(SEQ ID NO: 14, representing only the segment and not the whole 
plasmid) of this plasmid, called pBSGFPsgll, was then 
determined. The sequence analysis revealed that in addition 
to the designed nucleotide changes, which do no alter the 
amino acid sequence of GFP, and the Ilel68Thr mutation, a 
second spontaneous mutation had occurred. A thymidine at 
position 322 of SEQ ID NO: 14, which is the GFP-coding region 
of the pPBSGFPsgll DNA, was replaced by a cytosine. This 
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nucleotide change converts the phenylalanine (Phe) at position 
65 of the GFP amino acid sequence into a leucine (Leu) . A 
series of experiments, which will be described below, 
demonstrated that indeed the Phe65Leu mutation was responsible 
for the increase in the intensity of the fluorescent GFP 
signal . 

In subsequent experiments, involving generation of 
rationally designed GFP mutant combinations to be detailed 
below, we also used the single- stranded DNA site directed 
mutagenesis approach. This time, however, the template DNAs 
were pFRED7 derivatives instead of pBSGFP. 

Trans feet ion and expression 

The 293 cell line, an adenovirus- transformed human 
embryonal kidney cell line (Graham et a 1 . (1977), J. Gen. 
Virol. 5:59) was used for protein expression analysis. The 
cells were cultured in Dulbecco's modified culture medium 
(DMEM) supplemented with 10% heat- inactivated fetal bovine 
serum (FBS, Biofluids) . 

Transfection was performed by the calcium phosphate 
coprecipitation technique as previously described (Graham et 
al. (1973), Virol. 52:456; Felber et al . (1990), J. Virol. 
64:3734. Plasmid DNA was purified by Qiagen columns according 
to the manufacturer's instructions (Qiagen) . A mix of 5 to 10 
Mg of total DNA per ml of final precipitate was overlaid on 
the cells in 60 mm or 6- and 12-well tissue culture plates 
(Falcon), using 0.5, 0.25 and 0.125 ml of precipitate, 
respectively. After overnight incubation, the cells were 
washed, placed in medium without phenol red and measured in a 
plate spectrof luorometer , e.g., Cytofluor II (Perceptive 
Biosystems, Framingham, MA.) 

Purification of wild-type and mutant proteins; 

E. coll strains carrying pFRED13 or other pETlla 
derivatives with mutant GFP genes were used for the 
overproduction and purification of the wt and mutant GFPs or 
BFPS . The cells were grown in 1 liter LB broth containing 100 
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/ig/ml ampicillin at 32° C to a density of 0.6-0.8 optical 
density units at 600 nm. At this point, the cells were 
induced with 0.6 mM IPTG and incubated for four more hours. 
Following harvesting of the cell pellets, cellular extracts 
were prepared as described by Johnson, B.H and Hecht, M.H. , 
1994, Blotechnol, 12: 1357. 

GFPs and BFPs were purified from the cellular 
extracts as follows: Ammonium sulfate (AS) was added first to 
the extracts {50g AS per lOOg supernatant) to precipitate the 
proteins. The precipitants were collected by centrif ugation 
at 7500 x g for 15 min and the pellets were dissolved in 5ml 
of 1 M AS. The samples were then loaded on phenylsepharose 
column (HR10/10, Pharmacia, Piscataway, NJ) and washed with 20 
mM 2- [N-morpholino] ethanesulf onic Acid (MES) pH 5 . 6 and 1 M 
AS. Proteins were eluted with a 45 ml gradient to 20 mM MES, 
pH 5.6. Fractions containing the GFP or BFP protein were 
colored even under visible light. 

Green or blue-colored fractions were further 
purified on Q-sepharose (Mono Q, HR5/5, Pharmacia) with a 20 
ml gradient from 20 mM Tris pH 7.0 to 20 mM Tris pH 7.0, 0.25 
M NaCl. 

The AS precipitation step was performed at 4° C 
while the chromatographic procedures were performed at room 
temperature . 

Determination of protein concentration 

Protein concentrations were determined using the 
commercially available Bradford protein assay (BioRad, 
Hercules, CA) with bovine IgG protein as a standard. 

Analyti cal poly ac^ 

amide gels 

Analytical polyacrylamide gel electrophoresis was 
used to visualize the degree of purity of the purified GFP or 
BFP proteins. In all cases, 1 mm thick, 12% acrylamide gels 
(containing 0.1% SDS, in Tris buffer, pH 7.4) were used, and 
electrophoresis was performed for 2 hours at 120 V. Gels were 
stained with Coomassie Blue to visualize the proteins. 
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Fluorescence measurements 

Excitation and emission spectra of solutions of the 
fluorescent proteins were obtained using a Perkin Elmer L550B 
spectrof luorimeter (Perkin Elmer, Advanced Biosystems, Foster 
City, , CA) . 

The relative fluorescence data for the GFP mutants 
in Table I below were obtained by comparing the cellular 
fluorescence of the GFP mutants expressed in the transformed 
human embryonic kidney cell line 293 with wtGFP expressed in 
the same cell line. Likewise, the relative fluorescence data 
for the BFP mutants in Table I below were obtained by 
comparing the cellular fluorescence of the BFP mutants 
expressed in 293 cells with BFP (Tyr 67 -»His) expressed in the 
same cell line. Equal amounts of DNA encoding wild type or 
mutant proteins were introduced into 293 cells. Cellular 
fluorescence was quantified 24 h or 48 hr. post - transf ection 
using Cytofluor II. 

A list of GFP mutant proteins indicating the 
introduced amino acid mutations is shown in Table I. 



TABLE I: GFP and BFP mutants 



PROTEIN 


Amino Acid Position 


65 


66 


67 


164 


168 


239 


wt GFP 


F 


S 


Y 


V 


I 


K 


SGI 2 


L 












SG11 


L 








T 


N 


SG25 


L 


c 






T 


N 


BFP 






H 








SB42 


L 




H 








SB49 






H 


A 






SB50 


L 




H 


A 







Example 1: SG12 

A number of the unique mutants described herein 
derive from the discovery of an unplanned and unexpected 
mutation called "SG12", obtained in the course of site- 
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directed mutagenesis experiments, wherein a phenylalanine at 
position 65 of wtGFP was converted to leucine. SG12 was 
prepared as follows: Two plasmids carrying SG12 (SEQ ID NO: 15) 
were generated, pFRED12 for expression in mammalian cells, and 
PFRED16 for expression in E. coli and protein purification. 
PFRED12 was constructed by ligating the 1557 bp fragment from 
the double digestion of pFRED7 with Avr II and Pml I into the 
4681 bp fragment generated from the Avr II and Pml I digestion 
of pFREDll (see below) . pFREDl6 was derived by subcloning the 
717bp segment resulting from the digestion of pFREDl2 with 
Nhel and BamHI to the 5644bp fragment of the pETlla vector 
digested with the same restriction enzymes. 

The specific activity of SG12 was about 9-12 times 
that of wtGFP. See Table II. 

Exampl e 2; SGll 

A mutant referred to as "SGll," which combined the 
phenylalanine 65 to leucine alteration with an isoleucine 168 
to threonine substitution and a lysine 239 to asparagine 
susbstitution, gave a further enhanced fluorescence intensity. 
SGll was prepared as follows: Two plasmids carrying SGll (SEQ 
ID NO: 16) were generated: pFREDll for expression in mammalian 
cells and pFREDlS for expression in E. coli and protein 
purification. pFREDll was constructed by ligating the 717bp 
region from pBSGFPsgll DNA digested with Nhel and BamHI to the 
S221bp fragment derived from the digestion of pFRED7 with the 
same enzymes. pFREDlS was generated by subcloning the 717bp 
segment resulting from the digestion of pFREDll with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SGll encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leucine and the conversion of isoleucine 168 to threonine. 
The additional alteration of the C-terminal lys 239 to asn is 
without effect; the C-terminal lys or asn may be deleted 
without affecting fluorescence. The specific activity of SGll 
is about 19-38 times that of wtGFP. See Table II. 
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Example 3: SG2 5 

A third and further improved GFP mutant was obtained 
by further mutating "SGll." This mutant is referred to as 
"SG25" and comprises, in addtion to the SG11 substitutions, 
and additional substitution of a cysteine for the serine 
normally found at position 66 in the sequence. SG11 was 
prepared as follows: Two plasmids carrying SG25 (SEQ ID NO: 17) 
were generated: pFRED25 for expression in mammalian cells and 
pFRED63 for expression in E. coli and protein purification. 
PFRED25 was constructed by site directed mutagenesis of 
pFREDll, using oligonucleotide #18217 (SEQ ID N0:18): 
5 ' -CATTGAACACCATAGCACAGAGTAGTGACTAGTGTTGGCC- 3 ? . This 
oligonucleotide incorporates the Ser66Cys mutation into SGll. 
Ser66Cys had been shown to both alter the GFP excitation 
maxima without significant change in the emission spectrum and 
to also increase the intensity of the fluorescent signal of 
GFP (Heim et al . , 1995). 

pFRED6 3 was generated by subcloning the 717 bp 
segment resulting from the digestion of pFRED25 with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SG25 encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 6 5 to 
leu, the conversion of isoleucine 168 to threonine and the 
conversion of serine 66 to cysteine. As with SGll, the 
additional alteration of the C-terminal lysine 239 to 
asparagine is without effect; the C-terminal lysine or 
aspragine may be deleted without affecting fluorescence. The 
specific activity of SG25 is about 56 times that of wtGFP. 
See Table II. 

Example 4: Additional green fluorescent mutants 

Additional alterations at different amino acids of 
the wtGFP, when combined with SGll and SG25, yielded proteins 
having at least 5X greater cellular fluorescence compared to 
the wtGFP. A non- limiting list of these mutations is provided 
below : 
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GPP variants with enhanced cellular fluorescence 



rrotein 


Altered Ami no ar^ e 






F65L, 


S66T, 


I168T r 


K239N 






F65L, 


S66A, 


I168T, 


K23 9N 




en'? n 


Y40L, 


F65L, 


I168T, 


K239N 




ovjo U 


F47L, 


F65L, 


I168T, 


K23 9N 






F72L, 


F65L, 


I168T, 


K23 9N 




ov*4 J 


F65L, 


I168T, Y201L, K239N 




bb4 6 


F65L, 


V164A, I168T, K239N 




OVJ / ^ 


F65L, 


S66C, 


V164A, 


I168T, 


K23 9N 


o(j91 


F65L, 


S66C, 


F100L, 


I168T, 


K239N 


oCjy4 


F65L, 


S66C, 


Y107L, 


I168T, 


K239N 


SG95 


F65L, 


S66C, 


F115L, 


I168T, 


K23 9N 


SG96 


F65L, 


S66C, 


F131L, 


I168T, 


K239N 


SG98 


F65L, 


S66C, 


Y146L, 


I168T, 


K239N 


SG100 


F65L, 


S66C, 


Y152L, 


I168T, 


K23 9N 


SGI 01 


F65L, 


S66C, 


I168T, 


Y183L, 


K239N 


SG102 


F65L, 


S66C, 


I168T, 


F224L, 


K239N 


SG103 


F65L, 


S66C, 


I168T, 


Y238L, 


K239N 


SGI 06 


F65L, 


S66T, 


V164A, 


I168T, 


K23 9N 



Example 5; SB42 

The blue fluorescent proteins described here and 
below were derived from the known GFP mutant (Heim et al . , 
ENAS, 1994) wherein histidine is substituted for tyrosine at 
position 67. We have designated this known mutant 
BFP (Tyr 67 -»His) . BFP (Tyr 67 -*His ) has a shifted emission 
spectrum. it emits blue light, i.e., it is a blue fluorescent 
protein (BFP) . 

By introducing the same mutation in BFP (Tyr 67 -»His ) 
that was used to generate SG12, i.e., leucine for 
phenylalanine at position 65, we created a new mutant that has 
unexpectedly high fluorescence that we refer to as "SuperBlue- 
42" (SB42) . SB42 was prepared as follows: Two plasmids 
carrying SB42 (SEQ ID N0:19) were generated: pFRED4 2 for 
expression in mammalian cells and pFRED65 for expression in E. 
coli and protein purification. pFRED42 was constructed by 
site directed mutagenesis of pFRED12 , using oligonucleotide 
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#bio2 5 ( 5 - CATTGAACACCATGAGAGAGAGTAGTGACTAGTGTTGGCC - 3 ' ) ( SEQ ID 
NO: 20) . This oligonucleotide incorporates the Tyr 67 ->His 
mutation into SG12, thus generating the Phe65Leu, Tyr 67 -»His 
double mutant . 

pFRED65 was created by subcloning the 717 bp segment 
resulting from the digestion of pFRED4 2 with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector, digested with the 
same restriction enzymes. 

The mutant SB42 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine and the conversion of phenylalanine 65 to leucine. 
The specific activity of SB42 is about 27 times that of 
BFP(Tyr 67 -*His) . See Table II. 

Example 6: SB49 

An independent mutation of BFP (Tyr 67 -»His) which 
substitutes the valine at position 164 with an alanine is 
referred to as "SB49." SB49 was prepared as follows: Plasmid 
pFRED4 9 expresses SB49 (SEQ ID NO: 21) in mammalian cells. 
pFRED4 9 was generated by site directed mutagenesis of pFRED12, 
using oligonucleotides #19059 and #bio24 . Oligonucleotide 
#19059 ( 5 1 - CTTCAATGTTGTGGCGGATCTTGAAGTTCGCTTTGATTCCATTC - 3 " ) 
(SEQ ID NO: 22) introduces the Vall64Ala mutation in SG12 while 
oligonucleotide #bio24 (5 ! - 

CATTGAACACCATGAGAGAAAGTAGTGACTAGTGTTGGCC- 3 ' ) ( SEQ ID MO : 2 3 ) 
reverts the Phe65Leu alteration to the wt sequence and, at the 
same time, incorporates the Tyr 67 ->His mutation. 

The mutant SB4 9 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, and the conversion of valine 164 to alanine. The 
specific activity of SB4 9 was about 37 times that of 
BFP(Tyr 67 ->His) . See Table II. 

Example 7: SB50 

A combination of the above two BFP mutations y 
resulted in "SB50," which gave an even greater fluorescence 
enhancement than either of the previous mutations. SB50 was 
prepared as follows: Two plasmids carrying SB50 (SEQ ID NO: 
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24) were generated: pFREDSO for expression in mammalian cells 
and pFRED67 for expression in E. coli and protein 
purification. p FREDS 0 was constructed by site directed 
mutagenesis of pFRED12 , using oligonucleotides #19059 and 
#bio25. 

PFRED67 was created by subcloning the 717bp segment 
resulting from the digestion of pFREDSO with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector digested with the 
same restriction enzymes. 

The mutant SB50 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, the conversion of phenylalanine 65 to leucine and 
the conversion of alanine 164 to valine. The specific 
activity of SB50 was about 63 times that of BFP (Tyr 67 -*His) . 
See Table II. 



TABLE II 



Mutant 


Excitation 

Maximum 

(nm) 


Emission 
Maximum 
(nm) 


Factor of 
increased 
green 

fluorescence 
(at maximum 
emission) as 
compared to 
wtGFP 


Factor of 
increased blue 
fluorescence 
(at maximum 
emission) as 
compared to 
BFP(Tyr 6? ^His) 


SGI 2 


398 


509 


9-12X 




SG11 


471 


508 


19-38X 




SG25 


473 


509 


50-100X i 




SB42 


387 


450 




27X 


SB4 9 


387 


450 




37X 


SB50 


387 


450 




63X _J 



The dramatic increase in fluorescent activity 
resulting from the amino acid substitutions of the present 
invention was wholly unexpected. The cellular fluorescence of 
the mutants was at least five times greater, and usually over 
twenty times greater, than that of the parent wtGFP or 
BFP(Tyr 67 -*His) . Note that the maximum emission wavelengths 
vary among the mutants, and that the above -reported fold 
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increases refer only to minimal increases in relative cellular 
fluorescence at the maximum emission wavelength of the mutant. 
Given a particular wavelength, the values may be substantially 
larger, i.e., the mutants may have a 200 -fold greater cellular 
5 fluorescence than the reference wtGTP or BFP (Tyr 67 -»His) . This 
is important because devices for measuring fluorescence often 
have set wavelengths, or the limitations of a given experiment 
often require the use of a set wavelength. Thus, for example, 
the emission and detection parameters of a fluorescence 
10 microscope or a fluorescence-activated cell sorter may be set 
for a wavelength wherein the cellular fluorescence of a given 
mutant is 200- fold greater than that of the known GFPs and 
BFPs. 

The GFP and BFP mutants of this invention, in 
15 contrast to the wild type protein or other reported mutants, 
allow detection of green fluorescence in living mammalian 
cells when present in few copies stably integrated into the 
genome. This high cellular fluorescence of the mutant GFPs 
and BFPs is useful for rapid and simple detection of gene 
20 expression in living cells and tissues and for repeated 
analysis of gene expression over time under a variety of 
conditions. They are also useful for the construction of 
stable marked cell lines that can be quickly identified by 
fluorescence microscopy or fluorescence activated cell 
25 sorting. 

Example 8 

We have established f luoroplate-based assays for the 
quantitation of gene expression after transf ect ions . In a 

30 number of embodiments, a nucleic acid encoding a mutant GFP or 
BFP of this invention is inserted into a vector and introduced 
into and expressed in a cell. Typically, expression of GFP 
mutants can be detected as quickly as 5 hours post -infection 
or less. Expression is followed over time in living cells by 

35 a simple measurement in multi-well plates. In this way, many 
transf ections can be processed in parallel. 
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Example 9 

The vectors and nucleic acids provided herein are 
used to generate chimeric proteins wherein a nucleic acid 
sequence that encodes a selected gene product is fused to the 
C- or N- terminus of the mutant GFPs and/or BFPs of this 
invention. A number of unique viral, plasmid and hybrid gene 
constructs have been generated that incorporate the new mutant 
GFP and/or mutant BFP sequences indicated above. These 
include : 

• HIV viral sequences (in the nef gene) containing SG11 or 
SG25 

• Neomycin & hygrotnycin plasmids containing SG11 or SG25 

• Moloney Leukemia Virus vector (retrovirus) also 
expressing SG25 

• Hybrid gene constructs expressing HIV viral proteins 
(rev, td-rev, tat, nef, gag, env, and vpr) and either 
SG11 or SG25 or SB50. 

• Hybrid gene construct containing vectors that incorporate 
the cytoplasmic proteins ran, B23, nucleolin, poly-A 
binding protein and either SG11 or SG25 or SB50. 

These hybrids of the mutant nucleic acids provided 
herein are used to study protein trafficking in living 
mammalian cells. Like the wild type GFP, the mutant GFP 
proteins are normally distributed throughout the cell except 
for the nucleolus. Fusions to other proteins redistribute the 
fluorescence, depending on the partner in the hybrid. For 
example, fusion with the entire HIV-l Rev protein results in a 
hybrid molecule which retains the Rev function and is 
localized in the nucleolus where Rev is preferentially found. 
Fusion to the N~ terminal domain of the HIV-l Nef protein 
created a chimeric protein detected in the plasma membrane, 
the site of Nef localization. 

Example 10; pCMVafoll 

pCMVgfoll is a pFREDll derivative containing the 
bacterial neomycin phosphotransferase gene (neo) (Southern and 
Berg (1982) J. Mol . Appl . Genetics 1:327) fused at the 
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C- terminus of SG11. A four amino acid (Gly-Ala-Gly-Ala) (SEQ 
ID NO: 26) linker region connects the last amino acid of SG11 
to the second amino acid of neo, thus generating the hybrid 
SGll-neo protein (gfoll, SEQ ID NO: 25) . Gfoll is expressed 
5 from the CMV promoter and contains the intact SG11 polypeptide 
and all of neo except for the first Met. 

pCMVgfoll was constructed in several steps. First, 
pFREDHDNae was constructed by Nael digestion of pFREDll and 
self -ligation of the 4613bp fragment. The Nael deletion 
10 removes the SV4 0 promoter and neo gene from pFREDll,thus 

creating pFREDHDNae. Next, in order to fuse the neo coding 
region downstream to SG11, the neo gene was PCR amplified from 
pcDNA3 using primers Bio51 

( 5 ' - CGCGGATCCTTCGAACAAGATGGATTGCACGC- 3 1 ) (SEQ ID NO : 27 ) and 
15 Bio52 ( 5 - CCGG AATTCTCAGAAGAACTCGTCAAGAAGG CGA - 3 « ) ( SEQ ID 

NO: 28) . Primer Bio51 introduces a BamHI site followed by a 
BstBI recognition sequence at the 5 1 end of neo, while primer 
Bio52 introduces an EcoRI site 3' to the neo gene. The PCR 
product was digested with BamHI and EcoRI and cloned into the 
20 4582 bp vector resulting from the BamHI -EcoRI digestion of 

pFREDHDNae , thus generating pFREDHDNaeBstNeo . Subsequently, 
SG11 was PCR amplified from pFREDl lDNae using primers Bio4 9 
( 5 ■ - GGCGCGCAAGAAATGGCTAGCAAAGGAGAAGAACTCTTCACTGGAG - 3 1 ) ( SEQ I D 
NO: 29) and Bio50 

25 ( 5 ' - CCCATCGATAGCACCAGCACCGTTGTACAGTTCATCCATGCCATGT - 3 ' ) ( S EQ I D 
NO: 30) to remove the sgll stop codon in pFREDHDNaeBstNeo and 
to introduce the four amino acid (Gly- Ala-Gly-Ala) linker 
followed by a Clal site. The PCR product was digested with 
Nhel and Clal and cloned into the 4763 bp NhelBstBi fragment 

30 from pFREDHDNaeBstNeo, thus generating pCMVgfoll. 

Following transfection of 293 cells (Graham et al . 
(1977), J. Gen. Virol. 5:59) as well as other human and mouse 
cell lines with pCMVgfoll, bright fluorescent transf ectants 
were apparent under the flourescent microscope and colonies 

35 resistant to G418 could be obtained two weeks later. 

It should be noted that pCMVgfoll was the best 
protein fusion in terms of fluorescent emission intensity and 
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number of G418 resistant colonies compared to several SGll-neo 
or neo-SGll fusions generated and examined. 

Example 11: pPGKa£o25 

pPGKgfo25 is a pCMVgfoII derivative containing SG25 
instead of SG11 within gfo (SEQ ID NO: 31) . Expression of 
gfo25 in pPGKgfo25 is under the control of the mouse 
phosphoglycerate kinase -1 (PGK) promoter. 

pPGKgfo25 was constructed in several steps. First, 
a SacII site was introduced downstream of the PGK promoter in 
pPGKneobpA (Soriano et ai. (1991) Cell: 64-393) by: 

i) annealing oligonucleotides #18 990 (SEQ ID NO: 32) 
(5 1 -GACCGGGACACGTATCCAGCCTCCGC- 3 1 ) and 18 991 (SEQ ID 
NO : 3 3 ) ( 5 • - GGAGGCTGGATACGTGTCCCGGTCTGCA - 3 ' ) to create a 
double stranded adapter for PstI at the 5' end and SacII 
at the 3' end. 

ii) ligating this adapter to the 3423bp fragment from the 
PstI -SacII double digestion of pPGKneobpA, thus 
generating pPGKPtAfSc. 

Next, the CMV promoter of pFRED25 was replaced with the PGK 
promoter by cloning the 565bp Sail (filled with Klenow) -SacII 
region from pPGKPtAfSc to the 5288bp Bglll (filled with 
Klenow) -SacII fragment from pFRED25 , resulting in pFRED25PGK. 
In the final step, pPGKgfo25 was constructed by ligating the 
813bp Bglll-Ndel fragment from pFRED25PGK containing the PGK 
promoter and SG25, to the 4185bp Bglll-Ndel fragment of 
pCMVgfoll . 

Example 12; pGen-PGKcrf o25RO (SEP ID NO; 34) 

pGen-PGKgfo25RO is a pGen- (Soriano et al. (1991), J. 
Virol. 65:2314) derivative containing the gfo25 hybrid under 
the control of PGK promoter. It was constructed by subcloning 
the 2810bp Sail fragment of pPGKgfo25 into the Xhol site of 
pGen. In viruses generated from pGen-PGKgf o25RO (see below) 
transcription originated from the PGK promoter is in reverse 
orientation (RO) to that initiated from the viral long 
terminal repeats (LTR) . 
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To generate ecotropic or pseudotyped viruses, 
pGen-PGKgf O25R0 was co- transf ected into 293 cells together 
with pHIT60 and pHIT123 DNAs (production of ecotropic virus) 
or with pHIT60 and pHCMV-G DNAs (production of pseudotyped 
5 virus) . pHIT60 and pHIT123 contain the gag-pol and env coding 
regions from the Moloney murine leukemia virus (Mo-MLV) 
respectively, under the control of the CMV promoter (Soneoka 
et al . (1995), Nuc. Acid Res. 23:628. pHCMV-G contains the 
coding region of the G protein from the vesicular stomatitis 
10 virus (VSV) expressed from the CMV promoter (Yee et al . 

(1994), Proc. Nat 1 ! Acad. Sci . USA 91:9564. Virus-containing 
supernatants were harvested 48 hours post transf ection, 
filtered and stored at -80°C. 

15 Example 13: pNLnSGll (SEQ ID NO; 35) 

The SG11 sequence from plasmid pFREDll was PCR- 
amplified with primers #17982 (SEQ ID NO: 36) 

(5 1 -GGGGCGTACGGAGCGCTCCGAATTCGGTACCGTTTAAACGGGCCCTCTCGAGTCC 
GTTGTACAGTTCATCCATG-3 * ) and #17983 (SEQ ID NO:37) 

20 ( 5 ■ - GGGGGAATTCGCGCGCGTACGTAAGCGCTAGCTGAGCAAGAAATGGCTAGCAAA 

GGAGAAGAACTC- 3 1 ) . The PCR product was digested with BlpI and 
Xhol and cloned into the large Blpl-Xhol fragment from pNL4-3 
(Adachi et al . (1986), J. Virol. 59: 284. In pNLnSGll the 
full SG11 polypeptide containing an additional four 

25 linker- encoded amino acids at the C- terminus, is expressed as 
a hybrid protein with the 24 N- terminal amino acids of the 
HIV-1 protein Nef . 

We constructed transmissible HIV-1 stocks with our 
mutants, which generate green fluorescence upon transf ection 

30 of human cells. These transmissible HIV-1 stocks are used to 
detect the kinetics of infection under a variety of 
conditions. In particular, they are used to study the effects 
of drugs on the kinetics of infection. The level of 
fluorescence, and the subcellular compartment alizat ion of that 

35 fluorescence, is easily visualized and quantified using well 
known methods. This system is easy to visualize, and 
dramatically cuts the costs of many experiments that are 
presently tedious and expensive. 
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To produce infectious virus, pNLnSGll was 
transfected in 293 cells. 24 hours later, Jurkat cells were 
added to the transf ectants . At various times post -infection, 
the medium was removed, filtered, and used to infect fresh 
Jurkat or other HIV- l -permissive cells. Two days later the 
infected cells were green under fluorescent microscope. 
Visible syncytia were also green. Viral stocks were generated 
and kept at -80° C. 

When the nucleic acids, vectors, mutant proteins 
provided herein are combined with the knowledge of those 
skilled in the art of genetic engineering and the guidance 
provided herein, it will be apparent to one of ordinary skill 
in the art that many changes and modifications can be made 
thereto without departing from the spirit or scope of the 
invention as set forth herein. These changes and 
modifications are encompassed by the present invention. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

5 

(i) APPLICANT: Pavlakis, George N . 

Gaicanaris, George A. 
Stauber, Roland H. 
Vournakis, John N. 

10 

(ii> TITLE OF INVENTION: Mutant Aequorea victoria Fluorescent 
Proteins Having Increased Cellular Fluorescence 

(iii) NUMBER OF SEQUENCES: 37 

15 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Townsend and Townsend and Crew LLP 

(B) STREET: Two Embarcadero Center, 8th Floor 

(C) CITY: San Francisco 
20 (D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP : 94111-3834 

(v) COMPUTER READABLE FORM: 
25 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS - DOS 

(D) SOFTWARE: Patentln Release #1.0 , Version #1.30 

3 0 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US Not yet assigned 

(B) FILING DATE: Not yet assigned 

(C) CLASSIFICATION : 

3 5 (viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Weber, Kenneth A. 

(B) REGISTRATION NUMBER: 31,677 

(C) REFERENCE/ DOCKET NUMBER: 015280-24 9000 

40 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 576-0200 

(B) TELEFAX: (415) 576-0300 

45 (2) INFORMATION FOR SEQ ID NO : 1 : 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
50 <C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

55 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /product = "wild type Aequorea victoria 
60 Green Fluorescent Protein (wtGF) " 
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10 



20 



40 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ATG GCT AGC AAA GGA GAA GAA CTC TTC ACT GGA GTT GTC CCA ATT CTT 4 8 

Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu 
1 5 10 15 

GTT GAA TTA GAT GGT GAT GTT AAT GGG CAC AAA TTT TCT GTC AGT GGA 96 
Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Glv 
20 25 30 

GAG GGT GAA GGT GAT GCA ACA TAC GGA AAA CTT ACC CTT AAA TTT ATT 144 
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 



15 TGC ACT ACT GGA AAA CTA CCT GTT CCA TGG CCA ACA CTT GTC ACT ACT 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 



CCT GTC CTT TTA CCA GAC AAC CAT TAC CTG TCC ACA CAA TCT GCC CTT 
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 



192 



TTC TCT TAT GGT GTT CAA TGC TTT TCA AGA TAC CCG GAT CAT ATG AAA 240 
Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 ™ 75 80 

CGG CAT GAC TTT TTC AAG AGT GCC ATG CCC GAA GGT TAT GTA CAG GAA 288 
Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
25 85 90 95 

AGA ACT ATA TTT TTC AAA GAT GAC GGG AAC TAC AAG ACA CGT GCT GAA 336 
Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
3Q 100 105 no 

GTC AAG TTT GAA GGT GAT ACC CTT GTT AAT AGA ATC GAG TTA AAA GGT 384 
Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
US 120 ~ 125 

3 5 ATT GAT TTT AAA GAA GAT GGA AAC ATT CTT GGA CAC AAA TTG GAA TAC 4 32 

He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tvr 
130 135 140 



AAC TAT AAC TCA CAC AAT GTA TAC ATC ATG GCA GAC AAA CAA AAG AAT 480 
Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 150 155 160 

GGA ATC AAA GTT AAC TTC AAA ATT AGA CAC AAC ATT GAA GAT GGA AGC 528 
Gly He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 
165 170 175 

GTT CAA CTA GCA GAC CAT TAT CAA CAA AAT ACT CCA ATT GGC GAT GGC 576 
Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
ISO i 8 5 190 



624 



55 TCG AAA GAT CCC AAC GAA AAG AGA GAC CAC ATG GTC CTT CTT GAG TTT 672 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
21 ° 215 220 

cn GCT GCT GGG ATT ACA CAT GGC ATG GAT GAA CTA TAC AAA TAA 720 

bU Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys * 

225 230 235 240 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu 
15 10 15 

Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
20 25 30 

Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 

Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 70 75 80 

Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
85 90 95 

Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
100 105 110 

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
115 120 125 

He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
130 135 140 

Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 150 155 160 

Gly He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 
165 170 175 

Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
180 18S 190 

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix> FEATURE: 
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(A) NAME/KEY: - 

(B) LOCATION: 1. .35 

<D) OTHER INFORMATION: /note- "oligonucleotide sense primer 

#16417" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GGAGGCGCGC AAGAAATGGC TAGCAAAGGA GAAGA 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .36 

(D) OTHER INFORMATION: /note= "oligonucleotide antisense 

#16418" 



(XX ) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GCGGGATCCT TATTTGTATA GTTCATCCAT GCCATG 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6238 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..6238 

(D) OTHER INFORMATION: /note= "pFRED7 M 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



GACGGATCGG 


GAGATCTCCC 


GATCCCCTAT 


GGTCGACTCT CAGTACAATC TGCTCTGATG 


60 


CCGCATAGTT 


AAGCCAGTAT 


CTGCTCCCTG 


CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG 


120 


CGAGCAAAAT 


TTAAGCTACA 


AC AAGG CAAG 


GCTTGACCGA CAATTGCATG AAGAATCTGC 


180 


TTAGGGTTAG 


GCGTTTTGCG 


CTGCTTCGCC 


TCGAGGCCTG GCCATTGCAT ACGTTGTATC 


240 


CATATCATAA 


TATGTACATT 


TATATTGGCT 


CATGTCCAAC ATTACCGCCA TGTTGACATT 


300 


GATTATTGAC 


TAGTTATTAA 


TAGTAATCAA 


TTACGGGGTC ATTAGTTCAT AGCCCATATA 


360 


TGGAGTTCCG 


CGTTACATAA 


CTTACGGTAA 


ATGGCCCGCC TGGCTGACCG CCCAACGACC 


420 


CCCGCCCATT 


GACGTCAATA ATGACGTATG 


TTCCCATAGT AACGCCAATA GGGACTTTCC 


480 
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ATTGACGTCA ATGGGTGGAG TATTTACGGT 
ATCATATGCC AAGTACGCCC CCTATTGACG 
5 ATGCCCAGTA CATGACCTTA TGGGACTTTC 

TCGCTATTAC CATGGTGATG CGGTTTTGGC 
• ACTCACGGGG ATTTCCAAGT CTCCACCCCA 

10 

AAAATCAACG GGACTTTCCA AAATGTCGTA 
GTAGGCGTGT ACGGTGGGAG GTCTATATAA 
15 CCTGGAGACG CCATCCACGC TGTTTTGACC 

TCCGCGGGCG CGCAAGAAAT GGCTAGCAAA 
ATTCTTGTTG AATTAGATGG TGATGTTAAT 

20 

GAAGGTGATG CAACATACGG AAAACTTACC 
CCTGTTCCAT GGCCAACACT TGTCACTACT 
25 TACCCGGATC ATATGAAACG GCATGACTTT 

CAGGAAAGAA CTATATTTTT CAAAGATGAC 
TTTGAAGGTG ATACCCTTGT TAATAGAATC 

30 

GGAAACATTC TTGGACACAA ATTGGAATAC 
GCAGACAAAC AAAAGAATGG AATCAAAGTT 
35 GGAAGCGTTC AACTAGCAGA CCATTATCAA 

CTTTTACCAG ACAACCATTA CCTGTCCACA 
AAGAGAGACC ACATGGTCCT TCTTGAGTTT 

40 

GATGAACTAT ACAAATAAGG ATCCACTAGT 
ATATCCATCA CACTGGCGGC CGCTCGAGCA 
45 ACCTAAATGC TAGAGCTCGC TGATCAGCCT 

TTGTTTGCCC CTCCCCCGTG CCTTCCTTGA 
CCTAATAAAA TGAGGAAATT GCATCGCATT 

50 

GTGGGGTGGG GCAGGACAGC AAGGGGGAGG 
ATGCGGTGGG CTCTATGGCT TCTGAGGCGG 
55 CCCACGCGCC CTGTAGCGGC GCATTAAGCG 

CCGCTACACT TGCCAGCGCC CTAGCGCCCG 
CCACGTTCGC CGGCTTTCCC CGTCAAGCTC 

60 

TTAGTGCTTT ACGGCACCTC GACCCCAAAA 
GGCCATCGCC CTGATAGACG GTTTTTCGCC 
6 5 GTGGACTCTT GTTCCAAACT GGAACAACAC 

TATAAGGGAT TTTGGGGATT TCGGCCTATT 
TTAACGCGAA TTAATTCTGT GGAATGTGTG 
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AAACTGCCCA 


CTTGGCAGTA 


CATCAAGTGT 


540 


TCAATGACGG 


TAAATGGCCC 


GCCTGGCATT 


600 


CTACTTGGCA 


GTACATCTAC 


GTATTAGTCA 


660 


AGTACATCAA 


TGGGCGTGGA 


TAGCGGTTTG 


720 


TTGACGTCAA 


TGGGAGTTTG 


TTTTGGCACC 


780 


ACAACTCCGC 


CCCATTGACG 


CAAATGGGCG 


840 


GCAGAGCTCG 


TTTAGTGAAC 


CGTCAGATCG 


900 


TCCATAGAAG 


ACACCGGGAC 


CGATCCAGCC 


960 


GGAGAAGAAC 


TCTTCACTGG 


AGTTGTCCCA 


1020 


GGGCACAAAT 


TTTCTGTCAG 


TGGAGAGGGT 


1080 


CTTAAATTTA 


TTTGCACTAC 


TGGAAAACTA 


1140 


TTCTCTTATG 


GTGTTCAATG 


CTTTTCAAGA 


1200 


TTCAAGAGTG 


CCATGCCCGA 


AGGTTATGTA 


1260 


GGGAACTACA 


AGACACGTGC 


TGAAGTCAAG 


1320 


GAGTTAAAAG 


GTATTGATTT 


TAAAGAAGAT 


1380 


AACTATAACT 


CACACAATGT 


ATACATCATG 


1440 


AACTTCAAAA 


TTAGACACAA 


CATTGAAGAT 


1500 


CAAAATACTC 


CAATTGGCGA 


TGGCCCTGTC 


1560 


CAATCTGCCC 


TTTCGAAAGA 


TCCCAACGAA 


1620 


GTAACAGCTG 


CTGGGATTAC 


ACATGGCATG 


1680 


AACGGCCGCC 


AGTGTGCTGG 


AATTCTGCAG 


1740 


TGCATCTAGA 


GGGCCCTATT 


CTATAGTGTC 


1800 


CGACTGTGCC 


TTCTAGTTGC 


CAGCCATCTG 


1860 


CCCTGGAAGG 


TGCCACTCCC 


ACTGTCCTTT 


1920 


GTCTGAGTAG 


GTGTCATTCT 


ATTCTGGGGG 


1980 


ATTGGGAAGA 


CAATAGCAGG 


CATGCTGGGG 


2040 


AAAGAACCAG 


CTGGGGCTCT 


AGGGGGTATC 


2100 


CGGCGGGTGT 


GGTGGTTACG 


CGCAGCGTGA 


2160 


CTCCTTTCGC 


TTTCTTCCCT 


TCCTTTCTCG 


2220 


l/uvil ^.VjovjVj 


CH TfVc TTTA 

LHl^-Uvl A A. r\ 


GGGTTCCGAT 


2280 


AACTTGATTA 


GGGTGATGGT 


TCACGTAGTG 


2340 


CTTTGACGTT 


GGAGTCCACG 


TTCTTTAATA 


2400 


TCAACCCTAT 


CTCGGTCTAT 


TCTTTTGATT 


2460 


GGTTAAAAAA 


TGAGCTGATT 


TAACAAAAAT 


2520 


TCAGTTAGGG 


TGTGGAAAGT 


CCCCAGGCTC 


2580 
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CCCAGGCAGG CAGAAGTATG CAAAGCATGC 
AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA 
5 CCATAGTCCC GCCCCTAACT CCGCCCATCC 

CTCCGCCCCA TGGCTGACTA ATTTTTTTTA 
CTGAGCTATT CCAGAAGTAG TGAGGAGGCT 

10 

TCCCGGGAGC TTGTATATCC ATTTTCGGAT 
GCATGATTGA ACAAGATGGA TTGCACGCAG 
15 TCGGCTATGA CTGGGCACAA CAGACAATCG 

CAGCGCAGGG GCGCCCGGTT CTTTTTGTCA 
TGCAGGACGA GGCAGCGCGG CTATCGTGGC 

20 

TGCTCGACGT TGTCACTGAA GCGGGAAGGG 
AGGATCTCCT GTCATCTCAC CTTGCTCCTG 
25 TGCGGCGGCT GCATACGCTT GATCCGGCTA 

GCATCGAGCG AGCACGTACT CGGATGGAAG 
AAGAGCATCA GGGGCTCGCG CCAGCCGAAC 

30 

ACGGCGAGGA TCTCGTCGTG ACCCATGGCG 
ATGGCCGCTT TTCTGGATTC ATCGACTGTG 
35 ACATAGCGTT GGCTACCCGT GATATTGCTG 

TCCTCGTGCT TTACGGTATC GCCGCTCCCG 
TTGACGAGTT CTTCTGAGCG GGACTCTGGG 

40 

CCTGCCATCA CGAGATTTCG ATTCCACCGC 
CGTTTTCCGG GACGCCGGCT GGATGATCCT 
45 CGCCCACCCC AACTTGTTTA TTGCAGCTTA 

AAATTTCACA AATAAAGCAT TTTTTTCACT 
CAATGTATCT TATCATGTCT GTATACCGTC 

50 

GTCATAGCTG TTTCCTGTGT GAAATTGTTA 
CGGAAGCATA AAGTGTAAAG CCTGGGGTGC 
55 GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG 

CGGCCAACGC GCGGGGAGAG GCGGTTTGCG 
TGACTCGCTG CGCTCGGTCG TTCGGCTGCG 

60 

AATACGGTTA TCCACAGAAT CAGGGGATAA 
GCAAAAGGCC AGGAACCGTA AAAAGGCCGC 
65 CCCTGACGAG CATCACAAAA ATCGACGCTC 

ATAAAGATAC CAGGCGTTTC CCCCTGGAAG 
GCCGCTTACC GGATACCTGT CCGCCTTTCT 



65 

ATCTCAATTA GTCAGCAACC AGGTGTGGAA 264 0 

TGCAAAGCAT GCATCTCAAT TAGTCAGCAA 2700 

CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2760 

TTTATGCAGA GGCCGAGGCC GCCTCTGCCT 2820 

TTTTTGGAGG CCTAGGCTTT TGCAAAAAGC 2880 

CTGATCAAGA GACAGGATGA GGATCGTTTC 2 940 

GTTCTCCGGC CGCTTGGGTG GAGAGGCTAT 3000 

GCTGCTCTGA TGCCGCCGTG TTCCGGCTGT 3 060 

AGACCGACCT GTCCGGTGCC CTGAATGAAC 3120 

TGGCCACGAC GGGCGTTCCT TGCGCAGCTG 3180 

ACTGGCTGCT ATTGGGCGAA GTGCCGGGGC 32 40 

CCGAGAAAGT ATCCATCATG GCTGATGCAA 3300 

CCTGCCCATT CGACCACCAA GCGAAACATC 3360 

CCGGTCTTGT CGATCAGGAT GATCTGGACG 342 0 

TGTTCGCCAG GCTCAAGGCG CGCATGCCCG 3480 

ATGCCTGCTT GCCGAATATC ATGGTGGAAA 354 0 

GCCGGCTGGG TGTGGCGGAC CGCTATCAGG 3600 

AAGAGCTTGG CGG CGAATGG GCTGACCGCT 3660 

ATTCGCAGCG CATCGCCTTC TATCGCCTTC 3720 

GTTCGAAATG ACCGACCAAG CGACGCCCAA 3 780 

CGCCTTCTAT GAAAGGTTGG GCTTCGGAAT 3 84 0 

CCAGCGCGGG GATCTCATGC TGGAGTTCTT 3 900 

TAATGGTTAC AAATAAAG C A ATAGCATCAC 3 960 

GCATTCTAGT TGTGGTTTGT CCAAACTCAT 4020 

GACCTCTAGC TAGAGCTTGG CGTAATCATG 4080 

TCCGCTCACA ATTCCACACA ACATACGAGC 4140 

CTAATGAGTG AGCTAACTCA CATTAATTGC 4200 

AAACCTGTCG TGCCAGCTGC ATTAATGAAT 4260 

TATTGGGCGC TCTTCCGCTT CCTCGCTCAC 4320 

GCGAGCGGTA TCAGCTCACT CAAAGGCGGT 4 380 

CGCAGGAAAG AACATGTGAG CAAAAGGCCA 4440 

GTTGCTGGCG TTTTTCCATA GGCTCCGCCC 4 500 

AAGTCAGAGG TGGCGAAACC CGACAGGACT 4 560 

CTCCCTCGTG CGCTCTCCTG TTCCGACCCT 4620 

CCCTTCGGGA AGCGTGGCGC TTTCTCAATG 4 680 
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CTCACGCTGT 


AGGTATCTCA 


GTTCGGTGTA 


GGTCGTTCGC 


TCCAAGCTGG 


GCTGTGTGCA 


4740 




CGAACCCCCC 


GTTCAGCCCG 


ACCGCTGCGC 


CTTATCCGGT 


AACTATCGTC 


TTGAGTCCAA 


4800 


5 


CCCGGTAAGA 


CACGACTTAT 


CGCCACTGGC 


AGCAGCCACT 


GGTAACAGGA 


TTAGCAGAGC 


4860 




GAGGTATGTA 


GGCGGTGCTA 


CAGAGTTCTT 


GAAGTGGTGG 


CCTAACTACG 


GCTACACTAG 


4920 


10 


AAGGACAGTA 


TTTGGTATCT 


GCGCTCTGCT 


GAAGCCAGTT 


ACCTTCGGAA 


AAAGAGTTGG 


4980 


TAGCTCTTGA 


TCCGGCAAAC 


AAACCACCGC 


TGGTAGCGGT 


GGTTTTTTTG 


TTTGCAAGCA 


5040 




GCAGATTACG 


CGCAGAAAAA AAGGATCTCA AGAAGATCCT TTGATCTTTT 


CTACGGGGTC 


5100 


15 


TGACGCTCAG 


TGGAACGAAA ACTCACGTTA AGGGATTTTG GTCATGAGAT TATCAAAAAG 


5160 




GATCTTCACC 


TAGATCCTTT 


TAAATTAAAA 


ATGAAGTTTT 


AAATCAATCT 


AAAGTATATA 


5220 


20 


TGAGTAAACT 


TGGTCTGACA 


GTTACCAATG 


CTTAATCAGT 


GAGGCACCTA 


TCTCAGCGAT 


5280 


CTGTCTATTT 


CGTTCATCCA 


TAGTTGCCTG 


ACTCCCCGTC 


GTGTAGATAA 


CTACGATACG 


5340 




GGAGGGCTTA 


CCATCTGGCC 


CCAGTGCTGC 


AATGATACCG 


CGAGACCCAC 


GCTCACCGGC 


5400 


25 


TCCAGATTTA 


TCAGCAATAA 


ACCAGCCAGC 


CGGAAGGGCC 






5460 




AACTTTATCC 


GCCTCCATCC 


AGTCTATTAA 


TTGTTGCCGG 


GAAGCTAGAG 


TAAGTAGTTC 


5520 


30 


GCCAGTTAAT 


AGTTTGCGCA 


ACGTTGTTGC 


CATTGCTACA 


GGCATCGTGG 


TGTCACGCTC 


5580 


GTCGTTTGGT 


ATGGCTTCAT 


TCAGCTCCGG 


TTCCCAACGA 


TCAAGGCGAG 


TTACATGATC 


5640 




CCCCATGTTG 


TGCAAAAAAG 


CGGTTAGCTC 


CTTCGGTCCT 


CCGATCGTTG 


TCAGAAGTAA 


5700 


35 


GTTGGCCGCA 


GTGTTATCAC 


TCATGGTTAT 


GGCAGCACTG 


CATAATTCTC 


TTACTGTCAT 


5760 




GCCATCCGTA 


AGATGCTTTT 


CTGTGACTGG 


TGAGTACTCA 


ACCAAGTCAT 


TCTGAGAATA 


5820 


40 


GTGTATGCGG 


CGACCGAGTT 


GCTCTTGCCC 


GGCGTCAATA 


CGGGATAATA 


CCGCGCCACA 


5880 


TAGCAGAACT 


TTAAAAGTGC 


TCATCATTGG 


AAAACGTTCT 


TCGGGGCGAA 


AACTCTCAAG 






GATCTTACCG 


CTGTTGAGAT 


CCAGTTCGAT 


GTAACCCACT 


CGTGCACCCA 


ACTGATCTTC 


6000 


45 


AGCATCTTTT 


ACTTTCACCA 


GCGTTTCTGG 


GTGAGCAAAA 


ACAGGAAGGC 


AAAATGCCGC 


6060 




AAAAAAGGGA 


ATAAGGGCGA 


CACGGAAATG 


TTGAATACTC 


AT ACTCTTC C 


TTTTTCAATA 


6120 


50 


TTATTGAAGC 


ATTTATCAGG 


GTTATTGTCT 


CATGAGCGGA 


TACATATTTG 


AATGTATTTA 


6180 


GAAAAATAAA 


CAAATAGGGG 


TTCCGCGCAC 


ATTTCCCCGA 


AAAGTG CCAC 


CTGACGTC 


6236 



(2) INFORMATION FOR SEQ ID NO: 6: 

55 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3699 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
60 {D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



65 (ix) FEATURE: 

<A) NAME /KEY : - 

(B ) LOCATION: 1..3699 

(D) OTHER INFORMATION: /note= "pBSGFP" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GGAAATTGTA AACGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 60 

ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCGA 120 

GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 180 

CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 240 

CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 300 

CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 

AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 420 

CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCG CGCCATTCGC CATTCAGGCT 480 

GCGCAACTGT TGGGAAGGGC GATCGGTGCG GGCCTCTTCG CTATTACGCC AGCTGGCGAA 540 

AGGGGGATGT GCTGCAAGGC GATTAAGTTG GGTAACGCCA GGGTTTTCCC AGTCACGACG 600 

TTGTAAAACG ACGGCCAGTG AATTGTAATA CGACTCACTA TAGGGCGAAT TGGGTACCGG 660 

GCCCCCCCTC GAGGTCGACG GTATCGATAA GCTTGATGAT CCTTATTTGT ATAGTTCATC 720 

CATGCCATGT GTAATCCCAG CAG CTGTTAC AAACTCAAGA AGGACCATGT GGTCTCTCTT 780 

TTCGTTGGGA TCTTTCGAAA GGGCAGATTG TGTGGACAGG TAATGGTTGT CTGGTAAAAG 840 

GACAGGGCCA TCGCCAATTG GAGTATTTTG TTGATAATGG TCTGCTAGTT GAACGCTTCC 900 

ATCTTCAATG TTGTGTCTAA TTTTGAAGTT AACTTTGATT CCATTCTTTT GTTTGTCTGC 960 

CATGATGTAT ACATTGTGTG AGTTATAGTT GTATTCCAAT TTGTGTCCAA GAATGTTTCC 1020 

ATCTTCTTTA AAATCAATAC CTTTTAACTC GATTCTATTA ACAAGGGTAT CACCTTCAAA 1080 

CTTGACTTCA GCACGTGTCT TGTAGTTCCC GTCATCTTTG AAAAATATAG TTCTTTCCTG 1140 

TACATAACCT TCGGGCATGG CACTCTTGAA AAAGTCATGC CGTTTCATAT GATCCGGGTA 1200 

TCTTGAAAAG CATTGAACAC CATAAGAGAA AGTAGTGACA AGTGTTGGCC ATGGAACAGG 1260 

TAGTTTTCCA GTAGTGCAAA TAAATTTAAG GGTAAGTTTT CCGTATGTTG CATCACCTTC 1320 

ACCCTCTCCA CTGACAGAAA ATTTGTG CCC ATTAACATCA CCATCTAATT CAACAAGAAT 1380 

TGGGACAACT CCAGTGAAGA GTTCTTCTCC TTTGCTAGCC ATTTCTTGCG CGATCGAATT 1440 

CCTGCAGCCC GGGGGATCCA CTAGTTCTAG AGCGGCCGCC ACCGCGGTGG AGCTCCAGCT 1500 

TTTGTTCCCT TTAGTGAGGG TTAATTCCGA GCTTGGCGTA ATCATGGTCA TAGCTGTTTC 1560 

CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT ACGAGCCGGA AGCATAAAGT 1620 

GTAAAGCCTG GGGTGCCTAA TGAGTGAGCT AACTCACATT AATTGCGTTG CGCTCACTGC 1680 

CCGCTTTCCA GTCGGGAAAC CTGTCGTGCC AGCTGCATTA ATGAATCGGC CAACGCGCGG 1740 

GGAGAGGCGG TTTGCGTATT GGGCGCTCTT CCGCTTCCTC GCTCACTGAC TCGCTGCGCT 1800 

CGGTCGTTCG GCTGCGGCGA GCGGTATCAG CTCACTCAAA GGCGGTAATA CGGTTATCCA 1860 

CAGAATCAGG GGATAACGCA GGAAAGAACA TGTGAGCAAA AGGCCAGCAA AAGGCCAGGA 1920 

ACCGTAAAAA GGCCGCGTTG CTGGCGTTTT TCCATAGGCT CCGCCCCCCT GACGAGCATC 1980 

ACAAAAATCG ACGCTCAAGT CAGAGGTGGC GAAACCCGAC AGGACTATAA AGATACCAGG 2040 
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CGTTTCCCCC 


TGGAAGCTCC CTCGTGCGCT CTCCTGTTCC 


GACCCTGCCG 


CTTACCGGAT 


2100 




ACCTGTCCGC 


CTTTCTCCCT TCGGGAAGCG TGGCGCTTTC 


TCATAGCTCA CGCTGTAGGT 


2160 


5 


ATCTCAGTTC 


GGTGTAGGTC GTTCGCTCCA AGCTGGGCTG 


TGTGCACGAA 


CCCCCCGTTC 


2220 




AGCCCGACCG 


CTGCGCCTTA TCCGGTAACT ATCGTCTTGA 


GTCCAACCCG 


GTAAGACACG 


2280 


10 


ACTTATCGCC 


ACTGGCAGCA GCCACTGGTA ACAGGATTAG 


CAGAGCGAGG 


TATGTAGGCG 


2340 


GTGCTACAGA 


GTTCTTGAAG TGGTGGCCTA ACTACGGCTA 


CACTAGAAGG 


ACAGTATTTG 


2400 




GTATCTGCGC 


TCTGCTGAAG CCAGTTACCT TCGGAAAAAG 


AGTTGGTAGC 


TCTTGATCCG 


2460 


15 


GCAAACAAAC 


CACCGCTGGT AGCGGTGGTT TTTTTGTTTG 


CAAGCAGCAG 


ATTACGCGCA 


2520 




GAAAAAAAGG 


ATCTCAAGAA GATCCTTTGA TCTTTTCTAC 


GGGGTCTGAC 


GCTCAGTGGA 


2580 


20 


ACGAAAACTC 


ACGTTAAGGG ATTTTGGTCA TGAGATTATC 


AAAAAGGATC 


TTCACCTAGA 


2640 


TCCTTTTAAA 


TTAAAAATGA AGTT1TAAAT CAATCTAAAG 


TATATATGAG 


TAAACTTGGT 


2700 




CTGACAGTTA 


CCAATGCTTA ATCAGTGAGG CACCTATCTC 


AGCGATCTGT 


CTATTTCGTT 


2760 


25 


CATCCATAGT 


TGCCTGACTC CCCGTCGTGT AGATAACTAC 


GATACGGGAG 


GGCTTACCAT 


2820 




CTGGCCCCAG 


TGCTGCAATG ATACCGCGAG ACCCACGCTC 


ACCGGCTCCA 


GATTTATCAG 


2880 


30 


CAATAAACCA 


GCCAGCCGGA AGGGCCGAGC GCAGAAGTGG 


TCCTGCAACT 


TTATCCG CCT 


2940 


CCATCCAGTC 


TATTAATTGT TGCCGGGAAG CTAGAGTAAG 


TAGTTCGCCA 


GTTAATAGTT 


3000 




TGCGCAACGT 


TGTTGCCATT GCTACAGGCA TCGTGGTGTC 


ACGCTCGTCG 


TTTGGTATGG 


3060 


35 


CTTCATTCAG 


CTCCGGTTCC CAACGATCAA GGCGAGTTAC 


ATGATCCCCC 


ATGTTGTGCA 


3120 




AAAAAGCGGT 


TAGCTCCTTC GGTCCTCCGA TCGTTGTCAG 


AAGTAAGTTG 


GCCGCAGTGT 


3160 


40 


TATCACTCAT 


GGTTATGGCA GCACTGCATA ATTCTCTTAC 


TGTCATGCCA 


TCCGTAAGAT 


3240 


GCTTTTCTGT 


GACTGGTGAG TACTCAACCA AGTCATTCTG 


AGAATAGTGT 


ATGCGGCGAC 


3300 




CGAGTTGCTC 


TTGCCCGGCG TCAATACGGG ATAATACCGC 


GCCACATAGC 


AGAACTTTAA 


3360 


45 


AAGTG CTC AT 


CATTGGAAAA CGTTCTTCCG GGCGAAAACT 


CTCAAGGATC 


TTACCG CTGT 


3420 




TGAGATCCAG 


TTCGATGTAA CCCACTCGTG CACCCAACTG 


ATCTTCAGCA 


TCTTTTACTT 


3480 


50 


TCACCAGCGT 


TTCTGGGTGA GCAAAAACAG GAAGGCAAAA 


TGCCGCAAAA 


AAGGGAATAA 


3540 


GGGCGACACG 


GAAATGTTGA ATACTCATAC TCTTCCTTTT 


TCAATATTAT 


TGAAGCATTT 


3600 




ATCAGGGTTA 


TTGTCTCATG AGCGGATACA TATTTGAATG 


TATTTAGAAA 


AATAAACAAA 


3660 


55 


TAGGGGTTCC 


GCGCACATTT CCCCGAAAAG TGCCACCTG 






3699 




(2) INFORMATION FOR SEQ ID NO; 7: 









(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 6361 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

65 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 
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<B) LOCATION: 1 . . 6361 

(D) OTHER INFORMATION: /note= "pFRED13 " 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TTCTCATGTT TGACAGCTTA TCATCGATAA GCTTTAATGC 
TTGCTAACGC AGTCAGGCAC CGTGTATGAA ATCTAACAAT 
CACCGTCACC CTGGATGCTG TAGGCATAGG CTTGGTTATG 
GCGGGATATC CGGATATAGT TCCTCCTTTC AGCAAAAAAC 
GCCCCAAGGG GTTATGCTAG TTATTGCTCA GCGGTGGCAG 
CGGGCTTTGT TAGCAGCCGG ATCCTTATTT GTATAGTTCA 
AGCAGCTGTT ACAAACTCAA GAAGGACCAT GTGGTCTCTC 
AAGGGCAGAT TGTGTGGACA GGTAATGGTT GTCTGGTAAA 
TGGAGTATTT TGTTGATAAT GGTCTGCTAG TTGAACGCTT 
AATTTTGAAG TTAACTTTGA TTCCATTCTT TTGTTTGTCT 
TGAGTTATAG TTGTATTCCA ATTTGTGTCC AAGAATGTTT 
ACCTTTTAAC TCGATTCTAT TAACAAGGGT ATCACCTTCA 
CTTGTAGTTC CCGTCATCTT TGAAAAATAT AGTTCTTTCC 
GGCACTCTTG AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 
ACCATAAGAG AAAGTAGTGA CAAGTGTTGG CCATGGAACA 
AATAAATTTA AGGGTAAGTT TTCCGTATGT TGCATCACCT 
AAATTTGTGC CCATTAACAT CACCATCTAA TTCAACAAGA 
GAGTTCTTCT CCTTTGCTAG CCATATGTAT ATCTCCTTCT 
TCTAGAGGGG AATTGTTATC CGCTCACAAT TCCCCTATAG 
GGATCGAGAT CTCGATCCTC TACGCCGGAC GCATCGTGGC 
GTGCGGTTGC TGGCGCCTAT ATCGCCGACA TCACCGATGG 
TCGGGCTCAT GAGCGCTTGT TTCGGCGTGG GTATGGTGGC 
TGTTGGGCGC CATCTCCTTG CATGCACCAT TCCTTGCGGC 
ACCTACTACT GGGCTGCTTC CTAATGCAGG AGTCGCATAA 
GACACCATCG AATGG CGCAA AACCTTTCGC GGTATGGCAT 
CAATTCAGGG TGGTGAATGT GAAACCAGTA ACGTTATACG 
GTCTCTTATC AGACCGTTTC CCGCGTGGTG AACCAGG CCA 
CGGGAAAAAG TGGAAGCGGC GATGGCGGAG CTGAATTACA 
CAACTGGCGG GCAAACAGTC GTTGCTGATT GGCGTTGCCA 
GCGCCGTCGC AAATTGTCGC GGCGATTAAA TCTCGCGCCG 
GTGGTGTCGA TGGTAGAACG AAGCGGCGTC GAAGCCTGTA 
CTCGCGCAAC GCGTCAGTGG GCTGATCATT AACTATCCGC 



GGTAGTTTAT 
GCGCTCATCG 
CCGGTACTGC 
CCCTCAAGAC 
CAGCCAACTC 
TCCATGCCAT 
TTTTCGTTGG 
AGGACAGGGC 
CCATCTTCAA 
GCCATGATGT 
CCATCTTCTT 
AACTTGACTT 
TGTACATAAC 
TATCTTGAAA 
GGTAGTTTTC 
TCACCCTCTC 
ATTGGGACAA 
TAAAG TT AAA 
TGAGTCGTAT 
CGGCATCACC 
GG AAGATCGG 
AGGCCCCGTG 
GGCGGTGCTC 
GGGAGAGCGT 
GATAGCGCCC 
ATGTCGCAGA 
GCCACGTTTC 
TTCCCAACCG 
CCTCCAGTCT 
ATCAACTGGG 
AAGCGGCGGT 
TGGATGACCA 



CACAGTTAAA 
TCATCCTCGG 
CGGGCCTCTT 
CCGTTTAGAG 
AGCTTCCTTT 
GTGTAATCCC 
GATCTTTCGA 
CATCGCCAAT 
TGTTGTGTCT 
ATACATTGTG 
TAAAATCAAT 
CAGCACGTGT 
CTTCGGGCAT 
AGCATTGAAC 
CAGTAGTGCA 
CACTGACAGA 
CTCCAGTGAA 
CAAAATTATT 
TAATTTCGCG 
GGCGCCACAG 
GCTCGCCACT 
GCCGGGGGAC 
AACGGCCTCA 
CGAGATCCCG 
GGAAGAGAGT 
GTATGCCGGT 
TGCGAAAACG 
CGTGGCACAA 
GGCCCTGCAC 
TGCCAGCGTG 
GCACAATCTT 
GGATGCCATT 



60 
120 
180 
24 0 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 



WO 97/42320 

GCTGTGGAAG CTGCCTGCAC TAATGTTCCG 
CCCATCAACA GTATTATTTT CTCCCATGAA 
5 GTCGCATTGG GTCACCAGCA AATCGCGCTG 

CGTCTGCGTC TGGCTGGCTG GCATAAATAT 
GAACGGGAAG GCGACTGGAG TGCCATGTCC 

10 

GAGGGCATCG TTCCCACTGC GATGCTGGTT 
CGCGCCATTA CCGAGTCCGG GCTGCGCGTT 
15 GATACCGAAG ACAGCTCATG TTATATCCCG 

CTGCTGGGGC AAACCAGCGT GGACCGCTTG 
GGCAATCAGC TGTTGCCCGT CTCACTGGTG 

20 

CAAACCGCCT CTCCCCGCGC GTTGGCCGAT 
CGACTGGAAA GCGGGCAGTG AGCGCAACGC 
25 ACCGGGATCT CGACCGATGC CCTTGAGAGC 

GCGGGGCATG ACTATCGTCG CCGCACTTAT 
ACAGGTGCCG GCAGCGCTCT GGGTCATTTT 

30 

GATGATCGGC CTGTCGCTTG CGGTATTCGG 
CACTGGTCCC GCCACCAAAC GTTTCGGCGA 
35 CGACGCGCTG GGCTACGTCT TGCTGGCGTT 

TATGATTCTT CTCGCTTCCG GCGGCATCGG 
GCAGGTAGAT GACGACCATC AGGGACAGCT 

40 

AACTTCGATC ACTGGACCGC TGATCGTCAC 
GAACGGGTTG GCATGGATTG TAGGCGCCGC 
4 5 TCGCGGTGCA TGGAGCCGGG CCACCTCGAC 

GGATTCACCA CTCCAAGAAT TGGAGCCAAT 
ACCAACCCTT GGCAGAACAT ATCCATCGCG 

50 

ATCTCGGGCA GCGTTGGGTC CTGGCCACGG 
ACCCGGCTAG GCTGGCGGGG TTGCCTTACT 
55 CGAACGTGAA GCGACTGCTG CTGCAAAACG 

TTCGGTTTCC GTGTTTCGTA AAGTCTGGAA 
TTCCGGATCT GCATCGCAGG ATGCTGCTGG 

60 

ACGAAGCGCT GGCATTGACC CTGAGTGATT 
AGTTGTTTAC CCTCACAACG TTCCAGTAAC 
65 GTGAGCATCC TCTCTCGTTT CATCGGTATC 

ACGG AGG CAT CAGTGACCAA ACAGGAAAAA 
AGCCAGACAT TAACGCTTCT GGAGAAACTC 
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GCGTTATTTC 


TTGATGTCTC 


TGACCAGACA 


1980 


GACGGTACGC 


GACTGGGCGT 


GGAGCATCTG 


2040 


TTAGCGGGCC 


CATTAAGTTC 


TGTCTCGGCG 


2100 


CTCACTCGCA 


ATCAAATTCA 


GCCGATAGCG 


2160 


GGTTTTCAAC 


AAACCATGCA 


AATGCTGAAT 


2220 


GCCAACGATC 


AGATGGCGCT 


GGGCGCAATG 


2280 


GGTGCGGATA 


TCTCGGTAGT 


GGGATACGAC 


2340 


CCGTTAACCA 


CCATCAAACA 


GGATTTTCGC 


2400 


CTGCAACTCT 


CTCAGGGCCA 


GGCGGTGAAG 


2460 


AAAAGAAAAA 


CCACCCTGGC 


GCCCAATACG 


2520 


TCATTAATGC 


AGCTGGCACG 


ACAGGTTTCC 


2580 


AATTAATGTA 


AGTTAGCTCA 


CTCATTAGGC 


2640 


CTTCAACCCA 


GTCAGCTCCT 


TCCGGTGGGC 


2700 


GACTGTCTTC 


TTTATCATGC 


AACTCGTAGG 


2760 


CGGCGAGGAC 


CGCTTTCGCT 


GGAGCGCGAC 


2820 


AATCTTGCAC 


GCCCTCGCTC 


AAGCCTTCGT 


2880 


GAAGCAGGCC 


ATTATCGCCG 


GCATGGCGGC 


2940 


CGCGACGCGA 


GGCTGGATGG 


CCTTCCCCAT 


3000 


GATGCCCGCG 


TTGCAGGCCA 


TGCTGTCCAG 


3060 


TCAAGGATCG 


CTCGCGGCTC 


TTACCAGCCT 


3120 


GGCGATTTAT 


GCCGCCTCGG 


CGAGCACATG 


3180 


CCTATACCTT 


GTCTGCCTCC 


CCGCGTTGCG 


3240 


CTGAATGGAA 


GCCGGCGGCA 


CCTCGCTAAC 


3300 


CAATTCTTGC 


GGAGAACTGT 


GAATGCGCAA 


3360 


TCCGCCATCT 


CCAGCAGCCG 


CACGCGGCGC 


3420 


GTGCGCATGA 


TCGTGCTCCT 


GTCGTTGAGG 


3480 


GGTTAGCAGA 


ATGAATCACC 


GATACGCGAG 


3540 


TCTGCGACCT 


GAGCAACAAC 


ATGAATGGTC 


3600 


ACGCGGAAGT 


CAGCGCCCTG 


CACCATTATG 


3660 


CTACC CTGTG 


GAACACCTAC 


ATCTGTATTA 


3720 


TTTCTCTGGT 


CCCGCCGCAT 


CCATACCGCC 


3780 


CGGGCATGTT 


CATCATCAGT 


AACCCGTATC 


3840 


ATTACCCCCA 


TGAACAGAAA 


TCCCCCTTAC 


3900 


ACCGCCCTTA 


ACATGGCCCG 


CTTTATCAGA 


3960 


AACGAGCTGG 


ACGCGGATGA 


ACAGGCAGAC 


4020 
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ATCTGTGAAT CGCTTCACGA CCACGCTGAT GAGCTTTACC GCAGCTGCCT CGCGCG TTTC 
GGTGATGACG GTGAAAACCT CTGACACATG CAGCTCCCGG AGACGGTCAC AGCTTGTCTG 
TAAGCGGATG CCGGGAGCAG ACAAGCCCGT CAGGGCGCGT CAGCGGGTGT TGGCGGGTGT 
CGGGGCGCAG CCATGACCCA GTCACGTAGC GATAGCGGAG TGTATACTGG CTTAACTATG 
CGGCATCAGA GCAGATTGTA CTGAGAGTGC ACCATATATG CGGTGTGAAA TACCGCACAG 
ATGCGTAAGG AGAAAATACC GCATCAGGCG CTCTTCCGCT TCCTCGCTCA ' CTGACTCGCT 
GCGCTCGGTC GTTCGGCTGC GGCGAGCGGT ATCAGCTCAC TCAAAGGCGG TAATACGGTT 
ATCCACAGAA TCAGGGGATA ACGCAGGAAA GAACATGTGA GCAAAAGGCC AGCAAAAGGC 
CAGGAACCGT AAAAAGGCCG CGTTGCTGGC GTTTTTCCAT AGGCTCCGCC CCCCTGACGA 
GCATCACAAA AATCGACGCT CAAGTCAGAG GTGGCGAAAC CCGACAGGAC TATAAAGATA 
CCAGGCGTTT CCCCCTGGAA GCTCCCTCGT GCGCTCTCCT GTTCCGACCC TGCCGCTTAC 
CGGATACCTG TCCGCCTTTC TCCCTTCGGG AAGCGTGGCG CTTTCTCATA GCTCACGCTG 
T AGGTAT CTC AGTTCGGTGT AGGTCGTTCG CTCCAAGCTG GGCTGTGTGC ACGAACCCCC 
CGTTCAGCCC GACCGCTGCG CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGTAAG 
ACACGACTTA TCGCCACTGG CAGCAGCCAC TGGTAACAGG ATTAGCAGAG CGAGGTATGT 
AGGCGGTGCT ACAGAGTTCT TGAAGTGGTG GCCTAACTAC GGCTACACTA GAAGGACAGT 
ATTTGGTATC TGCGCTCTGC TGAAGCCAGT TACCTTCGGA AAAAGAGTTG GTAGCTCTTG 
ATCCGGCAAA CAAACCACCG CTGGTAGCGG TGGTTTTTTT GTTTGCAAGC AGCAGATTAC 
GCGCAGAAAA AAAGGATCTC AAGAAGATCC TTTGATCTTT TCTACGGGGT CTGACGCTCA 
GTGGAACGAA AACTCACGTT AAGGGATTTT GGTCATGAGA TTATCAAAAA GGATCTTCAC 
CTAGATCCTT TTAAATTAAA AATGAAGTTT TAAATCAATC TAAAGTATAT ATGAGTAAAC 
TTGGTCTGAC AGTTACCAAT G CTTAATC AG TGAGGCACCT ATCTC AG CG A TCTGTCTATT 
TCGTTCATCC ATAGTTGCCT GACTCCCCGT CGTGTAGATA ACTACGATAC GGGAGGGCTT 
ACCATCTGGC CCCAGTGCTG CAATGATACC GCGAGACCCA CGCTCACCGG CTCCAGATTT 
ATCAGCAATA AACCAGCCAG CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC 
CGCCTCCATC CAGTCTATTA ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA 
TAGTTTGCGC AACGTTGTTG CCATTGCTGC AGGCATCGTG GTGTCACGCT CGTCGTTTGG 
TATGGCTTCA TTCAGCTCCG GTTCCCAACG ATCAAGGCGA GTTACATGAT CCCCCATGTT 
GTGCAAAAAA GCGGTTAGCT CCTTCGGTCC TCCGATCGTT GTCAGAAGTA AGTTGGCCGC 
AGTGTTATCA CTCATGGTTA TGGCAGCACT GCATAATTCT CTTACTGTCA TGCCATCCGT 
AAGATGCTTT TCTGTGACTG GTGAGTACTC AACCAAGTCA TTCTGAGAAT AGTGTATGCG 
GCGACCGAGT TGCTCTTGCC CGGCGTCAAC ACGGGATAAT ACCGCGCCAC ATAGCAGAAC 
TTTAAAAGTG CTCATCATTG GAAAACGTTC TTCGGGGCGA AAACTCTCAA GGATCTTACC 
GCTGTTGAGA TCCAGTTCGA TGTAACCCAC TCGTGCACCC AACTGATCTT CAGCATCTTT 
TACTTTCACC AGCGTTTCTG GGTGAGCAAA AACAGGAAGG CAAAATGCCG CAAAAAAGGG 



4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4B60 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 
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30 
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AATAAGGGCG ACACGGAAAT GTTGAATACT CATACTCTTC CTTTTTCAAT ATTATTGAAG 6180 

CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT GAATGTATTT AGAAAAATAA 624 0 

ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA CCTGACGTCT AAGAAACCAT 6300 

TATTATCATG ACATTAACCT ATAAAAATAG GCGTATCACG AGGCCCTTTC GTCTTCAAGA 6360 
A 



(2) INFORMATION FOR SEQ ID NO: 8; 



(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 4 8 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE : DNA 

(ix) FEATURE: 

(A) NAME /KEY : - 
2 5 (B> LOCATION: 1..48 

(D) OTHER INFORMATION: /note= "oligonucleotide #17422'* 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pair: 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
4 0 (D) TOPOLOGY: linear 

(i.i) MOLECULE TYPE : DNA 



6361 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
CAATTTGTGT CCCAGAATGT TGCCATCTTC CTTGAAGTCA ATACCTTT 4 8 

(2) INFORMATION FOR SEQ ID NO: 9: 



45 
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( ix ) FEATURE : 

(A) NAME /KEY : - 

<B) LOCATION: 1. .47 

(D) OTHER INFORMATION; /note= "oligonucleotide #17423" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO; 9: 
GTCTTGTAGT TGCCGTCATC TTTGAAGAAG ATGCTCCTTT CCTGTAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i> SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE : DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 52 

(D) OTHER INFORMATION: /note, "oligonucleotide #17424" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
CATGGAACAG GCAGTTTGCC AGTAGTGCAG ATGAACTTCA GGGTAAGTTT TC 



(2) INFORMATION FOR SEQ ID NO: 11; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

<B) LOCATION: 1..4 0 

(D) OTHER INFORMATION: /note= "oligonucleotide #17425" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CTCCACTGAC AGAGAACTTG TGGCCGTTAA CATCACCATC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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CTGGGACACA 


AATTGGAATA 


75 

CAACTATAAC TCACACAATG TATACATCAT 


GGCAGACAAA 


600 




CAAAAGAATG 


GAATCAAAGT 


GAACTTCAAG ACCCGCCACA ACATTGAAGA 


TGGAAGCGTT 


660 


5 


CAACTAGCAG 


ACCATTATCA 


ACAAAATACT CCAATTGGCG ATGGCCCTGT 


CCTTTTACCA 


720 




GACAACCATT 


ACCTGTCCAC 


ACAATCTGCC CTTTCGAAAG ATCCCAACGA 


AAAGAGAGAC 


780 


10 


CACATGGTCC TTCTTGAGTT 


TGTAACAGCT GCTGGGATTA CACATGGCAT 


GGATGAACTG 


840 




TACAACTGA 








849 



15 



65 



(2) INFORMATION FOR SEQ ID NO: 15: 



(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

25 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .720 

(D) OTHER INFORMATION: /note= "SG12" 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO.-1S: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

35 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

4q CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

45 GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

• AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

50 GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 540 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

55 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 

(2) INFORMATION FOR SEQ ID NO:16: 

60 (*) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 720 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
(ix) FEATURE: 
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GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 6 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 6i 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 7: 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 0 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /note= "oligonucleotide #18217" 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CATTGAACAC CATAGCACAG AG T AGTG ACT AGTGTTGGCC 4 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME / KEY : - 

(B) LOCATION: 1. .720 

(D) OTHER INFORMATION: /note= "SB42" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 3 00 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 
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CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 
(2) INFORMATION FOR SEQ ID NO: 20: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 

10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

15 (ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /note= "oligonucleotide #bio25 M 



20 



25 



35 



60 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
CATTGAACAC CATGAGAGAG AGTAGTGACT AGTGTTGGCC 4 0 

(2) INFORMATION FOR SEQ ID NO:21: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 
3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 720 

4 0 (D) OTHER INFORMATION: /note= "SB49 tt 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

4 5 ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 1B0 

50 

CTAGTCACTA CTTTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

55 TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

GGAATCAAAG CGAACTTCAA GATCCGCCAC AACATTGAAG ATGGAAG CGT TCAACTAGCA 540 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

65 TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 44 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

10 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B> LOCATION: 1 . . 44 

(D) OTHER INFORMATION: /note= "oligonucleotide #19059" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CTTCAATGTT GTGGCGGATC TTGAAGTTCG CTTTGATTCC ATTC 



(2) INFORMATION FOR SEQ ID NO: 23: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 



30 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
35 (A) NAME /KEY: - 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /note= "oligonucleotide #bio24" 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

CATTGAACAC CATGAGAGAA AGTAGTGACT AGTGTTGGCC 4 0 



45 (2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



55 
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(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /note= "SB50" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



ATGGCTAGCA 


AAGGAGAAGA 


ACTCTTCACT 


GGAGTTGTCC 


CAATTCTTGT 


TGAATTAGAT 


60 


GGTGATGTTA 


ACGGCCACAA 


GTTCTCTGTC 


AGTGGAGAGG 


GTGAAGGTGA 


TGCAACATAC 


120 










1 uLL loll Lt 


A rGGCCAACA 


1 OA 

180 


CTAGTCACTA 


CTCTCTCTCA 


TGGTGTTCAA 


TGCTTTTCAA 


GATACCCGGA 


TCATATGAAA 


240 


CGGCATGACT 


TTTTCAAGAG 


TGCCATGCCC 


GAAGGTTATG 


TACAGGAAAG 


GACCATCTTC 


300 


TTCAAAGATG 


ACGGCAACTA 


CAAGACACGT 


GCTGAAGTCA 


AGTTTGAAGG 


TGATACCCTT 


360 


GTTAATAGAA 


TCGAGTTAAA 


AGGTATTGAT 


TTTAAAGAAG 


ATGGAAACAT 


TCTTGGACAC 


420 


AAATTGGAAT 


ACAACTATAA 


CTCACACAAT 


GTATACATCA 


TGGCAGACAA 


ACAAAAGAAT 


480 


GGAATCAAAG 


CGAACTTCAA 


GATCCGCCAC 


AACATTGAAG 


ATGGAAGCGT 


TCAACTAGCA 


540 


GACCATTATC 


AACAAAATAC 


TCCAATTGGC 


GATGGCCCTG 


TCCTTTTACC 


AGACAACCAT 


600 


TACCTGTCCA 


CACAATCTGC 


CCTTTCGAAA 


GATCCCAACG 


AAAAGAGAGA 


CCACATGGTC 


660 


CTTCTTGAGT 


TTGTAACAGC 


TGCTGGGATT 


ACACATGGCA 


TGGATGAACT 


ATACAAATAA 


720 



<2) INFORMATION FOR SEQ ID NO: 25 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1521 

(D) OTHER INFORMATION: /note= "pCMVgfoll" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



ATGGCTAGCA 


AAGGAGAAGA 


ACTCTTCACT 


GGAGTTGTCC 


CAATTCTTGT 


TGAATTAGAT 


60 


GGTGATGTTA 


ACGGCCACAA 


GTTCTCTGTC 


AGTGGAGAGG 


GTGAAGGTGA 


TGCAACATAC 


120 


GGAAAACTTA 


CCCTGAAGTT 


CATCTGCACT 


ACTGGCAAAC 


TGCCTGTTCC 


ATGGCCAACA 


180 


CTTGTCACTA 


CTCTCTCTTA 


TGGTGTTCAA 


TGCTTTTCAA 


GATACCCGGA 


TCATATGAAA 


240 


CGGCATGACT 


TTTTCAAGAG 


TCCCATGCCC 


GAAGGTTATG 


TACAGGAAAG 


GACCATCTTC 


300 


TTCAAAGATG 


ACGGCAACTA 


CAAGACACGT 


GCTGAAGTCA 


AGTTTGAAGG 


TGATACCCTT 


360 


GTTAATAGAA 


TCGAGTTAAA 


AGG TATTG AC 


TTCAAGGAAG 


ATGGCAACAT 


TCTGGGACAC 


420 


AAATTGGAAT 


ACAACTATAA 


CTCACACAAT 


GTATACATCA 


TGGCAGACAA 


ACAAAAGAAT 


4B0 


GGAATCAAAG 


TGAACTTCAA 


GACCCGCCAC 


AACATTGAAG 


ATGGAAGCGT 


TCAACTAGCA 


540 
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GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 
CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACGGT 
GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 
CTATTCGGCT ATGACTGGGC ACAACAGACA ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 
CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 
GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA CGACGGGCGT TCCTTGCGCA 
GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA AGGGACTGGC TGCTATTGGG CGAAGTGCCG 
GGGCAGGATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA AAGTATCCAT CATGGCTGAT 
GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC CATTCGACCA CCAAGCGAAA 
CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC TTGTCGATCA GGATGATCTG 
GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 
CCCGACGGCG AGGATCTCGT CGTGACCCAT GGCGATGCCT GCTTGCCGAA TATCATGGTG 
GAAAATGGCC GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 
CAGGACATAG CGTTGG CTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 
CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AG CGCATCGC CTTCTATCGC 
CTTCTTGACG AGTTC TT CTG A 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Gly Ala Gly Ala 
1 

(2) INFORMATION FOR SEQ ID NO: 27: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1521 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..32 

(D) OTHER INFORMATION: /note= "primer BioSl" 

(Xi) SEQUENCE DESCRIPTION; SEQ ID NO:27: 
CGCGGATCCT TCGAACAAGA TGGATTGCAC GC 32 

(2) INFORMATION FOR SEQ ID NO: 28: 



(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 
25 (B) LOCATION: 1 . . 34 

<D) OTHER INFORMATION: /note= "primer Bio52 r 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CCGGAATTCT CAGAAGAACT CGTCAAGAAG GCGA 34 

(2) INFORMATION FOR SEQ ID NO: 29: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

45 (ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..46 

(D) OTHER INFORMATION: /note= "primer Bio4 9" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
GGCGCGCAAG AAATGGCTAG CAAAGGAGAA GAACTCTTCA CTGGAG 46 



(2) INFORMATION FOR SEQ ID NO: 30: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
60 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



65 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION : 1..46 

(D) OTHER INFORMATION: /note= "primer Bio50» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CCCATCGATA GCACCAGCAC CGTTGTACAG TTCATCCATG CCATGT 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA 



46 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1521 

(D) OTHER INFORMATION: /note* M pPGKg£o25 M 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC 
GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG 
GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC 
CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA 
CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA 
GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG 
AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA 
GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG 
GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG 
CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA 
GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC 
CTATTCGGCT ATGACTGGGC ACAACAGACA ATCGGCTGCT 
CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT GTCAAGACCG 
GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA 
GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA AGGGACTGGC 
GGGCAGGATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA 
GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC 
CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC 



CAATTCTTGT 
GTGAAGGTGA 
TGCCTGTTCC 
GATACCCGGA 
TACAGGAAAG 
AGTTTGAAGG 
ATGGCAACAT 
TGGCAGACAA 
ATGGAAGCGT 
TCCTTTTACC 
AAAAGAGAGA 
TGGATGAACT 
CGGCCGCTTG 
CTGATGCCGC 
ACCTGTCCGG 
CGACGGGCGT 
TGCTATTGGG 
AAGTATCCAT 
CATTCGACCA 
TTGTCGATCA 



TGAATTAGAT 
TGCAACATAC 
ATGGCCAACA 
TCATATGAAA 
GACCATCTTC 
TGATACCCTT 
TCTGGGACAC 
ACAAAAGAAT 
TCAACTAGCA 
AGACAACCAT 
CCACATGGTC 
GTACAACGGT 
GGTGGAGAGG 
CGTGTTCCGG 
TGCCCTGAAT 
TCCTTGCGCA 
CGAAGTGCCG 
CATGGCTGAT 
CCAAGCGAAA 
GGATGATCTG 



60 
120 
180 
240 
300 
360 
420 
480 
54 0 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 1260 

CCCGACGGCG AGGATCTCGT CGTGACCCAT GGCGATGCCT GCTTGCCGAA TATCATGGTG 1320 

GAAAATGGCC GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 1380 

CAGGACATAG CGTTGGCTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 1440 

CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCG C 1500 

CTTCTTGACG AGTTCTTCTG A 1521 

(2) INFORMATION FOR SEQ ID NO: 32: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

25 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .26 

(D) OTHER INFORMATION: /note= "oligonucleotide #18990" 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 32: 
GACCGGGACA CGTATCCAGC CTCCGC 26 

(2) INFORMATION FOR SEQ ID NO: 33: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 8 base pairs 
4 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii> MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME / KEY : - 

(B) LOCATION: 1..28 

50 (D) OTHER INFORMATION: /note= "oligonucleotide #18991" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
55 GGAGGCTGGA TACGTGTCCC GGTCTGCA 28 

(2) INFORMATION FOR SEQ ID NO: 34: 

60 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7617 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
(ix) FEATURE: 
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(A) NAME/ KEY: - 

(B) LOCATION: 1. .7617 

<D) OTHER INFORMATION: /note= "pGeri- PGKgf o2SRO" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
TCGAGGTCGA CGGTATCGAT TAGTCCAATT TGTTAAAGAC AGGATATCAG TGGTCCAGGC 
TCTAGTTTTG ACTCAACAAT ATCACCAGCT GAAGCCTATA GAGTACGAGC CATAGATAAA 
ATAAAAGATT TTATTTAGTC TCCAGAAAAA GGGGGGAATG AAAGACCCCA CCTGTAGGTT 
TGGCAAGCTA GCTTAAGTAA CGCCATTTTG CAAGGCATGG AAAAATACAT AACTGAGAAT 
AG AG AAG TTC AGATCAAGGT CAGGAACAGA TGGAACAGCT GAATATGGGC CAAACAGGAT 
ATCTGTGGTA AGCAGTTCCT GCCCCGGCTC AGGGCCAAGA ACAGATGGAA CAGCTGAATA 
TGGGCCAAAC AGGATATCTG TGGTAAGCAG TTCCTGCCCC GGCTCAGGGC CAAGAACAGA 
TGGTCCCCAG ATGCGGTCCA GCCCTCAGCA GTTTCTAGAG AACCATCAGA TGTTTCCAGG 
GTGCCCCAAG GACCTGAAAT GACCCTGTGC CTTATTTGAA CTAACCAATC AGTTCGCTTC 
TCGCTTCTGT TCGCGCGCTT CTGCTCCCCG AGCTCAATAA AAGAGCCCAC AACCCCTCAC 
TCGGGGCGCC AGTCCTCCGA TTGACTGAGT CGCCCGGGTA CCCGTGTATC CAATAAACCC 
TCTTGCAGTT GCATCCGACT TGTGGTCTCG CTGTTCCTTG GGAGGGTCTC CTCTGAGTGA 
TTGACTACCC GTCAGCGGGG GTCTTTCATT TGGGGGCTCG TCCGGGATCG GGAGACCCCT 
GCCCAGGGAC CACCGACCCA CCACCGGGAG GTAAGCTGGC CAGCAACTTA TCTGTGTCTG 
TCCGATTGTC TAGTGTCTAT GACTGATTTT ATGCGCCTGC GTCGGTACTA GTTAGCTAAC 
TAG CT CTGTA TCTGGCGGAC CCGTGGTGGA ACTGACGAGT TCGGAACACC CGGCCGCAAC 
CCTGGGAGAC GTCCCAGGGA CTTCGGGGGC CGTTTTTGTG GCCCGACCTG AGTCCAAAAA 
TCCCGATCGT TTTGGACTCT TTGGTGCACC CCCCTTAGAG GAGGGATATG TGGTTCTGGT 
AGGAGACGAG AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGGA 
CCGAAGCCGC GCCGCGCGTC TTGTCTGCTG CAGCATCGTT CTGTGTTGTC TCTGTCTGAC 
TGTGTTTCTG TATTTGTCTG AGAATATGGG CCAGACTGTT ACCACTCCCT TAAGTTTGAC 
CTTAGGTCAC TGGAAAGATG TCGAGCGGAT CGCTCACAAC CAGTCGGTAG ATGTCAAGAA 
GAGACGTTGG GTTACCTTCT GCTCTGCAGA ATGGCCAACC TTTAACGTCG GATCGCCGCG 
AGACGGCACC TTTAACCGAG ACCTCATCAC CCAGGTTAAG ATCAAGGTCT TTTCACCTGG 
CCCGCATGGA CACCCAGACC AGGTCCCCTA CATCGTGACC TGGGAAGCCT TGGCTTTTGA 
CCCCCCTCCC TGGGTCAAGC CCTTTGTACA CCCTAAGCCT CCGCCTCCTC TTCCTCCATC 
CGCCCCGTCT CTCCCCCTTG AACCTCCTCG TTCGACCCCG CCTCGATCCT CCCTTTATCC 
AGCCCTCACT CCTTCTCGAC GGTATACAGA CATGATAAGA TACATTGATG AGTTTGGACA 
AACCACAACT AGAATGCAGT GAAAAAAATG CTTTATTTGT GAAATTTGTG ATGCTATTGC 
TTTATTTGTA ACCATTATAA GCTGCAATAA ACAAGTTGGG GTGGGCGAAG AACTCCAGCA 
TGAGATCCCC GCGCTGGAGG ATCATCCAGC CGGCGAACGT GGCGAGAAAG GAAGGGAAGA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1660 
1740 
1800 
1860 
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AAGCGAAAGG AGCGGGCGCT AGGGCGCTGG 
CCACACCCGC CGCGCTTAAT GCGCCGCTAC 
5 AGCTGGTTCT TTCCGCCTCA GAAGCCATAG 

TGTCTTCCCA ATCCTCCCCC TTGCTGTCCT 
ACCTACTCAG ACAATGCGAT GCAATTTCCT 

10 

CACCTTCCAG GGTCAAGGAA GGCACGGGGG 
AAGGCACAGT CGAGGCTGAT CAGCGAGCTC 
15 CCTCTAGATG CATGCTCGAG CGGCCGCCAG 

AACTCGTCAA GAAGGCGATA GAAGGCGATG 
AGCACGAGGA AGCGGTCAGC CCATTCGCCG 

20 

AACGCTATGT CCTGATAGCG GTCCGCCACA 
AAGCGG C CAT TTT CCACCAT GATATTCGGC 
25 TCCTCGCCGT CGGGCATGCG CGCCTTGAGC 

TGATGCTCTT CGTCCAGATC ATCCTGATCG 
CGCTCGATGC GATGTTTCGC TTGGTGGTCG 

30 

AGCCGCCGCA TTGCATCAGC CATGATGGAT 
AGGAGATCCT GCCCCGGCAC TTCGCCCAAT 
35 ACGTCGAGCA CAGCTGCGCA AGGAACGCCC 

TCGTCCTGCA GTTCATTCAG GGCACCGGAC 
CCCTGCGCTG ACAGCCGGAA CACGGCGGCA 

40 

TCATAGCCGA ATAGCCTCTC CACCCAAGCG 
TCGATAGCAC CAGCACCGTT GTACAGTTCA 
4 5 ACAAACTCAA GAAGGACCAT GTGGTCTCTC 

TGTGTGGACA GGTAATGGTT GTCTGGTAAA 
TGTTGATAAT GGTCTGCTAG TTGAACGCTT 

50 

TTCACTTTGA TTCCATTCTT TTGTTTGTCT 
TTGTATTCCA ATTTGTGTCC CAGAATGTTG 
55 TCGATTCTAT TAACAAGGGT ATCACCTTCA 

CCGTCATCTT TGAAGAAGAT GGTCCTTTCC 
AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 

60 

AGAGTAGTGA CTAGTGTTGG CCATGGAACA 
AGGGTAAGTT TTCCGTATGT TGCATCACCT 
€5 CCGTTAACAT CACCATCTAA TTCAACAAGA 

CCTTTGCTAG CCATTTCTTG CGCGCCCGCG 
CGAAAGGCCC GGAGATGAGG AAGAGGAGAA 
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CAAGTGTAGC 


GGTCACGCTG 


CGCGTAACCA 


1920 


AGGGCGCGTG 


GGGATACCCC 


CTAGAGCCCC 


1980 


AGCCCACCGC 


ATCCCCAGCA 


TGCCTGCTAT 


2040 


GCCCCACCCC 


ACCCCCCAGA 


ATAGAATGAC 


2100 


CATTTTATTA 


GGAAAGGACA 


GTGGGAGTGG 


2160 


AGGGGCAAAC 


AACAGATGGC 


TGGCAACTAG 


2220 


TAGCATTTAG 


GTGACACTAT 


AGAATAGGGC 


2280 


TGTGATGGAT 


ATCTGCAGAA 


TTCTCAGAAG 


2340 


CGCTGCGAAT 


CGGGAGCGGC 


GATACCGTAA 


2400 


CCAAGCTCTT 


CAGCAATATC 


ACGGGTAGCC 


2460 


CCCAGCCGGC 


CACAGTCGAT 


GAATCCAGAA 


2520 


AAGCAGGCAT 


CGCCATGGGT 


CACGACGAGA 


2580 


CTGGCGAACA 


GTTCGGCTGG 


CGCGAGCCCC 


2640 


ACAAGACCGG 


CTTCCATCCG 


AGTACGTGCT 


2700 


AATGGGCAGG 


TAGCCGGATC 


AAGCGTATGC 


2760 


ACTTTCTCGG 


CAGGAGCAAG 


GTGAGATGAC 


2820 


AGCAGCCAGT 


CCCTTCCCGC 


TTCAGTGACA 


2880 


GTCGTGGCCA 


GCCACGATAG 


CCGCGCTGCC 


2940 


AGGTCGGTCT 


TGACAAAAAG 


AACCGGGCGC 


3000 


TCAGAGCAGC 


CGATTGTCTG 


TTGTGCCCAG 


3060 


GCCGGAGAAC 


CTGCGTGCAA 


TCCATCTTGT 


3120 


TCCATGCCAT 


GTGTAATCCC 


AGCAGCTGTT 


3180 


TTTTCGTTGG 


GATCTTTCGA 


AAGGGCAGAT 


3240 


AGGACAGGGC 


CATCGCCAAT 


TGGAGTATTT 


3300 


CCATCTTCAA 


TGTTGTGGCG 


GGTCTTGAAG 


3360 


GCCATGATGT 


ATACATTGTG 


TGAGTTATAG 


3420 


CCATCTTCCT 


TGAAGTCAAT 


ACCTTTTAAC 


3480 


AACTTGACTT 


CAGCACGTGT 


CTTGTAGTTG 


3540 


TGTACATAAC 


CTTCGGGCAT 


GGCACTCTTG 


3600 


TATCTTGAAA 


AnnATTrcAAr' 




3660 


GGCAGTTTGC 


CAGTAGTGCA 


GATGAACTTC 


3720 


TCACCCTCTC 


CACTGACAGA 


GAACTTGTGG 


3780 


ATTGGGACAA 


CTCCAGTGAA 


GAGTTCTTCT 


3840 


GAGGCTGGAT 


ACGTGTCCCG 


GTCTGCAGGT 


3900 


CAGCGCGGCA 


GACGTGCGCT 


TTTGAAGCGT 


3960 
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GCAGAATGCC GGGCTCCGGA GGACCTTCGC GCCCGCCCCG CCCCTGAGCC CGCCCCTGAG 4 020 
CCCGCCCCCG GACCCACCCC TTCCCAGCCT CTGAGCCCAG AAAGCGAAGG AGCCAAGCTG 4080 
CTATTGGCCG CTGCCCCAAA GGCCTACCCG CTTCCATTGC TCAGCGGTGC TGTCCATCTG 4140 
CACGAGACTA GTGAGACGTG CTACTTCCAT TTGTCACGTC CTGCACGACG CGAGCTGCGG 4200 
GGCGGGGGGG AACTTCCTGA CTAGGGGAGG AGTAGAAGGT GGCGCGAAGG GGCCACCAAA 4260 
GAAGGGAGCC GGTTGGCGCT ACCGGTGGAT GTGGAATGTG TGCGAGGCCA GAGGCCACTT 4320 
GTGTAGCGCC AAGTGCCAGC GGGGCTGCTA AAGCGCATGC TCCAGACTGC CTTGGGAAAA 4380 

GCGCCTCCCC TACCCGGTAG AATTCGATAT CAAGCTTATC GATACCGTCG AGATCTCCCG 4440 

ATCCGTCGAG GTCGACGGTA TCGATTAGTC CAATTTGTTA AAGACAGGAT ATCAGTGGTC 4500 

CAGGCTCTAG TTTTGACTCA ACAATATCAC CAGCTGAAGC CTATAGAGTA CGAGCCATAG 4560 

ATAAAATAAA AGATTTTATT TAGTCTCCAG AAAAAGGGGG GAATGAAAGA CCCCACCTGT 4620 

AGGTTTGGCA AGCTAGCTTA AGTAACGCCA TTTTGCAAGG CATGGAAAAA TACATAACTG 4680 

AGAATAGAGA AGTTCAGATC GGGATCCCAA TTCTTTCGGA CTTTTGAAAG TGATGGTGGT 4 740 

GGGGGAAGGA TTCGAACCTT CGAAGTCGAT GACGGCAGAT TTAGAGTCTG CTCCCTTTGG 4 800 

CCGCTCGGGA ACCCCACCAC GGGTAATGCT TTTACTGGCC TGCTCCCTTA TCGGGAAGCG 4860 

GGGCGCATCA TATCAAATGA CGCGCCGCTG TAAAGTGTTA CGTTGAGAAA GAATTGGGAT 4 920 

CCCGATCAAG GTCAGGAACA GATGGAACAG CTAGAGAACC ATCAGATGTT TCCAGGGTGC 4980 

CCCAAGGACC TGAAATGACC CTGTG CCTTA TTTGAACTAA CCAATCAGTT CGCTTCTCGC 5040 

TTCTGTTCGC GCGCTTCTGC TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG 5100 

GGCGCCAGTC CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT AAACCCTCTT 5160 

GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG GGTCTCCTCT GAGTGATTGA 5220 

CTACCCGTCA GCGGGGGTCT TTCACCCAGA GTTTGGAACT TACTGTCTTC TTGGGACCTG 5280 

CAGCCCGGGG GATCCACTAG TTCTAGAGCG GCCGCCACCG CGGTGG ATT C TGCCTCGCGC 534 0 

GTTTCGGTGA TGACGGTGAA AACCTCTGAC ACATGCAGCT CCCGGAGACG GTCACAGCTT 54 00 

GTCTGTAAGC GGATGCCGGG AGCAGACAAG CCCGTCAGGG CGCGTCAGCG GGTGTTGGCG 5460 

GGTGTCGGGG CGCAGCCATG ACCCAGTCAC GTAGCGATAG CGGAGTGTAT ACTGGCTTAA 5520 

CTATGCGGCA TCAGAGCAGA TTGTACTGAG AGTGCACCAT ATGCGGTGTG AAATACCGCA 558 0 

CAGATGCGTA AGGAGAAAAT ACCGCATCAG GCGCTCTTCC GCTTCCTCGC TCACTGACTC 564 0 

GCTGCGCTCG GTCGTTCGGC TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG 5700 

GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG TGAGCAAAAG GCCAGCAAAA 5 760 

GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 582 0 

CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG 588 0 

ATACCAGGCG TTTCCCCCTG GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT 594 0 

TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC AATGCTCACG 6000 

CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 6060 
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CCCCGTTCAG CCCGACCGCT GCGCCTTATC 
AAGACACGAC TTATCGCCAC TGGCAGCAGC 
5 TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 

AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 
TTGATCCGGC AAACAAACCA CCGCTGGTAG 

10 

TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 
TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 
15 CACCTAGATC CTTTTAAATT AAAAATGAAG 

AACTTGGTCT GACAGTTACC AATGCTTAAT 
ATTTCGTTCA TCCATAGTTG CCTGACTCCC 

20 

CTTACCATCT GGCCCCAGTG CTGCAATGAT 
TTTATCAGCA ATAAACCAGC CAGCCGGAAG 

2 5 ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 

TAATAGTTTG CGCAACGTTG TTGCCATTGC 
TGGTATGGCT TCATTCAGCT CCGGTTCCCA 

30 

GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 
CGCAGTGTTA TCACTCATGG TTATGGCAGC 

3 5 CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 

GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 
AACTTTAAAA GTGCTCATCA TTGGAAAACG 

40 

ACCGCTGTTG AGATCCAGTT CGATGTAACC 
TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 

4 5 GGGAATAAGG GCGACACGGA AATGTTGAAT 

AAGCATTTAT CAGGGTTATT GTCTCATGAG 
TAAACAAATA GGGGTTCCGC GCACATTTCC 

50 

CATTATTATC ATGACATTAA CCTATAAAAA 
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CGGTAACTAT 


CGTCTTGAGT 


CCAACCCGGT 


6120 


CACTGGTAAC 


AGGATTAG C A 


GAGCGAGGTA 


6180 


GTGGCCTAAC 


TACGGCTACA 


CTAGAAGGAC 


6240 


AGTTACCTTC 


GGAAAAAGAG 


TTGGTAGCTC 


6300 


CGGTGGTTTT 


TTTGTTTGCA 


AGCAGCAGAT 


6360 


TCCTTTGATC 


TTTTCTACGG 


GGTCTGACGC 


6420 


TTTGGTCATG 


AGATTATCAA 


AAAGGATCTT 


6480 


TTTTAAATCA 


ATCTAAAGTA 


TATATGAGTA 


6540 


CAGTGAGGCA 


CCTATCTCAG 


CGATCTGTCT 


6600 


CGTCGTGTAG 


ATAACTACGA 


TACGGGAGGG 


6660 


ACCGCGAGAC 


CCACGCTCAC 


CGGCTCCAGA 


6720 


GGCCGAGCGC 


AGAAGTGGTC 


CTGCAACTTT 


6780 


CCGGGAAGCT 


AGAGTAAGTA 


GTTCGCCAGT 


6840 


TGCAGGCATC 


GTGGTGTCAC 


GCTCGTCGTT 


6900 


ACGATCAAGG 


CGAGTTACAT 


GATCCCCCAT 


6960 


TCCTCCGATC 


GTTGTCAGAA 


GTAAGTTGGC 


7020 


ACTGCATAAT 


TCTCTTACTG 


TCATGCCATC 


7080 


CTCAACCAAG 


TCATTCTGAG 


AATAGTGTAT 


7140 


AACACGGGAT 


AATACCGCGC 


CACATAGCAG 


7200 


TTCTTCGGGG 


CGAAAACTCT 


CAAGGATCTT 


7260 


CACTCGTGCA 


CCCAACTGAT 


CTTCAGCATC 


7320 


AAAAACAGGA 


AGGCAAAATG 


CCGCAAAAAA 


7380 


ACTCATACTC 


TTC CTTTTTC 


AATATTATTG 


7440 


CGGATACATA 


TTTGAATGTA 


TTTAGAAAAA 


7500 


CCGAAAAGTG 


CCACCTGACG 


TCTAAGAAAC 


7560 


TAGGCGTATC 


ACGAGGCCCT 


TTCGTCT 


7617 
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(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1S581 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D> TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

10 

(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1.. 15581 

15 (D) OTHER INFORMATION: /note* "pNLnSGll" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



20 




AATTTGGTCC 


CAAAAAAGAC 


AAGAGATCCT 


TGATCTGTGG 


ATCTACCACA 


60 




C AC AAfSfJ PT a 


Lli LLL ILjAa 


1 GGCAGAACT 


ACACACCAGG 


GCCAGGGATC 


AGATATCCAC 


120 


25 


* VJAI^U. XXX 


ATGGTGCTTC 


AAGTTAGTAC 


CAGTTGAACC 


AGAGCAAGTA 


GAAGAGGCCA 


180 








TTGTTACACC 


CTATGAGCCA 


GCATGGGATG 


GAGGACCCGG 


240 




MVjLjLjAVjAALi I 


ATTAGTGTGG 


AAGTTTGACA 


GCCTCCTAGC 


ATTTCGTCAC 


ATGGCCCGAG 


300 


30 




UL? A(j 1 AC. 1 AL 


AAAGACTGCT 


GACATCGAGC 


TTTCTACAAG 


GGACTTTCCG 


360 




(Tf^f^ CifZ A PTT 

* UVJUVjML A 1 


a U L_ALrVj Lr AL»0 


TGTGGCCTGG 


GCGGGACTGG 


GGAGTGGCGA 


GCCCTCAGAT 


420 


35 


GCTAPATATA 


AVjL.A<jV_ 1 VjU X 


TTTTGCCTGT 


ACTGGGTCTC 


TCTGGTTAGA 


CCAGATCTGA 


480 


GCCTGGGAGr 


TY^'I V'TO/"*/ "I* TV 
A V. A V_ IbbC 1 A 


At- 1 AvjCjGAAC 


CCACTGCTTA 


AGCCTCAATA 


AAGCTTGCCT 


540 






AAGTAGTGTG 


TGCCCGTCTG 


TTGTGTGACT 


CTGGTAACTA 


GAGATCCCTC 


600 


a n 

*4 U 


AGACCCTTTT 


AGTCAGTGTG 


GAAAATCTCT 


AGCAGTGGCG 


CCCGAACAGG 


GACTTGAAAG 


660 




CGAAAGTAAA 


GCCAGAGGAG 


ATCTCTCGAC 


GCAGGACTCG 


GCTTGCTGAA 


GCGCGCACGG 


720 


45 


CAAGAGGCGA 


GGGGCGGCGA 


CTGGTGAGTA 


CGCCAAAAAT 


TTTGACTAGC 


GGAGGCTAGA 


780 


AGGAGAGAGA 


TGGGTGCGAG 


AGCGTCGGTA 


TTAAGCGGGG 


GAGAATTAGA 


TAAATGGGAA 


840 




AAAATTCGGT 


TAAGGCCAGG 


GGGAAAGAAA 


CAATATAAAC 


TAAAACATAT 


AGTATGGGCA 


900 


50 


AG CAGGG AG C 


TAGAACGATT 


CGCAGTTAAT 


CCTGGCCTTT 


TAGAGACATC 


AGAAGGCTGT 


960 




AGACAAATAC 


TGGGACAGCT 


ACAACCATCC 


CTTCAGACAG 


GATCAGAAGA 


ACTTAGATCA 


1020 


55 


TTATATAATA 


CAATAGCAGT 


CCTCTATTGT 


GTGCATCAAA 


GGATAGATGT 


AAAAGACACC 


1080 


AAGGAAGCCT 


TAGATAAGAT 


AGAGGAAGAG 


CAAAACAAAA 


GTAAGAAAAA 


GGCACAGCAA 


1140 




GCAGCAGCTG 


ACACAGGAAA 


CAACAGCCAG 


GTCAGCCAAA 


ATTACCCTAT 


AGTGCAGAAC 


1200 


60 


CTCCAGGGGC 


AAATGGTACA 


TCAGGCCATA 


TCACCTAGAA 


CTTTAAATGC 


ATGGGTAAAA 


1260 




GTAGTAGAAG 


AGAAGGCTTT 


CAGCCCAGAA 


GTAATACCCA 


TGTTTTCAGC 


ATTATCAGAA 


1320 


65 


GGAGCCACCC 


CACAAGATTT 


AAATACCATG 


CTAAACACAG 


TGGGGGGACA 


TCAAGCAGCC 


1380 




ATGCAAATGT 


TAAAAGAGAC 


CATCAATGAG 


GAAGCTGCAG 


AATGGGATAG 


ATTGCATCCA 


1440 




GTGCATGCAG 


GGCCTATTGC 


ACCAGGCCAG 


ATGAGAGAAC 


CAAGGGGAAG 


TGACATAGCA 


1500 
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GGAACTACTA GTACCCTTCA GGAACAAATA 
GTAGGAGAAA TCTATAAAAG ATGGATAATC 
5 AGCCCTACCA GCATTCTGGA CATAAGACAA 

GACCGATTCT ATAAAACTCT AAGAGCCGAG 
ACAGAAACCT TGTTGGTCCA AAATGCGAAC 

10 

GGACCAGGAG CGACACTAGA AGAAATGATG 
CATAAAGCAA GAGTTTTGGC TGAAGCAATG 
15 ATACAGAAAG GCAATTTTAG GAACCAAAGA 

GAAGGGCACA TAGCCAAAAA TTGCAGGGCC 
AAGGAAGGAC ACCAAATGAA AGATTGTACT 

20 

TGGCCTTCCC ACAAGGGAAG GCCAGGGAAT 
CCACCAGAAG AGAGCTTCAG GTTTGGGGAA 
25 CCGATAGACA AGGAACTGTA TCCTTTAGCT 

TCGTCACAAT AAAGATAGGG GGGCAATTAA 
ATACAGTATT AGAAGAAATG AATTTGCCAG 

30 

TTGGAGGTTT TATCAAAGTA GGACAGTATG 
AAGCTATAGG TACAGTATTA GTAGGACCTA 
35 TGACTCAGAT TGGCTGCACT TTAAATTTTC 

AATTAAAGCC AGGAATGGAT GGCCCAAAAG 
TAAAAGCATT AGTAGAAATT TGTACAGAAA 

40 

GGCCTGAAAA TCCATACAAT ACTCCAGTAT 
GGAGAAAATT AGTAGATTTC AGAGAACTTA 
45 AATTAGGAAT ACCACATCCT GCAGGGTTAA 

TGGGCGATGC ATATTTTTCA GTTCCCTTAG 
CCATACCTAG TATAAACAAT GAGACACCAG 

50 

AGGGATGGAA AGGATCACCA GCAATATTCC 
TTAGAAAACA AAATCCAGAC ATAGTCATCT 
55 CTGACTTAGA AATAGGGCAG CATAGAACAA 

GGTGGGGATT TACCACACCA GACAAAAAAC 
GTTATGAACT CCATCCTGAT AAATGGACAG 

60 

GCTGGACTGT CAATGACATA CAGAAATTAG 
ATGCAGGGAT TAAAGTAAGG CAATTATGTA 
65 AAGTAGTACC ACTAACAGAA GAAGCAGAGC 

AAGAACCGGT ACATGGAGTG TATTATGACC 
AGCAGGGGCA AGGCCAATGG ACATATCAAA 
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GGATGGATGA 


CACATAATCC 


ACCTATCCCA 


1560 


CTGGGATTAA 


ATAAAATAGT 


AAGAATGTAT 


1620 


GGACCAAAGG 


AACCCTTTAG 


AGACTATGTA 


1680 


CAAGCTTCAC 


AAGAGGTAAA 


AAATTGGATG 


1740 


CCAGATTGTA 


AGACTATTTT 


AAAAGCATTG 


1B00 


ACAGCATGTC 


AGGGAGTGGG 


GGGACCCGGC 


1860 


AGCCAAGTAA 


CAAATCCAGC 


TACCATAATG 


1920 


AAGACTGTTA 


AGTGTTTCAA 


TTGTGGCAAA 


1980 


CCTAGGAAAA 


AGGGCTGTTG 


GAAATGTGGA 


2040 


GAGAGACAGG 


CTAATTTTTT 


AGGGAAGATC 


2100 


TTTCTT C AG A 


GCAGACCAGA 


GCCAACAGCC 


2160 


GAGACAACAA 


CTCCCTCTCA 


GAAGCAGGAG 


2220 


TCCCTCAGAT 


CACTCTTTGG 


CAGCGACCCC 


2280 


AGG AAG CTCT 


ATTAGATACA 


GGAGCAGATG 


2340 


GAAGATGGAA 


ACCAAAAATG 


ATAGGGGGAA 


2400 


ATCAGATACT 


CATAGAAATC 


TGCGGACATA 


2460 


CACCTGTCAA 


CATAATTGGA 


AGAAATCTGT 


2520 


CCATTAGTCC 


TATTGAGACT 


GTACCAGTAA 


2580 


TTAAACAATG 


GCCATTGACA 


GAAGAAAAAA 


2640 


TGGAAAAGGA 


AGGAAAAATT 


TCAAAAATTG 


2700 


TTGCCATAAA 


GAAAAAAGAC 


AGTACTAAAT 


2760 


ATAAGAGAAC 


TCAAGATTTC 


TGGGAAGTTC 


2820 


AACAGAAAAA 


ATCAGTAACA 


GTACTGGATG 


2880 


ATAAAGACTT 


CAGGAAGTAT 


ACTGCATTTA 


2940 


GGATTAGATA 


TCAGTACAAT 


GTGCTTCCAC 


3000 


AGTGTAGCAT 


GACAAAAATC 


TTAGAGCCTT 


3060 


ATCAATACAT 


GGATGATTTG 


TATGTAGGAT * 


3120 


AAATAGAGGA 


ACTGAGACAA 


CATCTGTTGA 


3180 


ATCAGAAAGA 


ACCTCCATTC 


CTTTGGATGG 


3240 


TACAGCCTAT 


AGTGCTGCCA 


GAAAAGGACA 


3300 


TGGGAAAATT 


GAATTGGGCA 


AGTCAGATTT 


3360 


AACTTCTTAG 


GGGAACCAAA 


GCACTAACAG 


3420 


TAGAACTGGC 


AGAAAACAGG 


GAGATTCTAA 


3480 


CATCAAAAGA 


CTTAATAGCA 


GAAATACAGA 


3540 


TTTATCAAGA 


GCCATTTAAA 


AATCTGAAAA 


3600 
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CAGGAAAATA TGCAAGAATG AAGGGTGCCC ACACTAATGA TGTGAAACAA TTAACAGAGG 3660 

CAGTACAAAA AATAGCCACA GAAAGCATAG TAATATGGGG AAAGACTCCT AAATTTAAAT 3720 

TACCCATACA AAAGGAAACA TGGGAAGCAT GGTGGACAGA GTATTGGCAA GCCACCTGGA 3780 

TTCCTGAGTG GGAGTTTGTC AATACCCCTC CCTTAGTGAA GTTATGGTAC CAGTTAGAGA 3840 

AAGAACCCAT AATAGGAGCA GAAACTTTCT ATGTAGATGG GGCAGCCAAT AGGGAAACTA 3900 

AATTAGGAAA AGCAGGATAT GTAACTGACA GAGGAAGACA AAAAGTTGTC CCCCTAACGG 3960 

ACACAACAAA TCAGAAGACT GAGTTACAAG CAATTCATCT AGCTTTGCAG GATTCGGGAT 4020 

TAGAAGTAAA CATAGTGACA GACTCACAAT ATGCATTGGG AATCATTCAA GCACAACCAG 4 080 

ATAAGAGTGA ATCAGAGTTA GTCAGTCAAA TAATAGAGCA GTTAATAAAA AAGGAAAAAG 414 0 

TCTACCTGGC ATGGGTACCA GCACACAAAG GAATTGGAGG AAATGAACAA GTAGATGGGT 4200 

TGGTCAGTGC TGGAATCAGG AAAGTACTAT TTTTAGATGG AATAGATAAG GCCCAAGAAG 4260 

AACATGAGAA ATATCACAGT AATTGGAGAG CAATGGCTAG TGATTTTAAC CTACCACCTG 4320 

TAGTAGCAAA AGAAATAGTA GCCAGCTGTG ATAAATGTCA GCTAAAAGGG GAAGCCATGC 4 380 

ATGGACAAGT AG ACTGT AG C CCAGGAATAT GGC AG CTAGA TTGTACACAT TTAGAAGGAA 444 0 

AAGTTATCTT G GTAGC AGTT CATGTAGCCA G TGG AT AT AT AGAAGCAGAA GTAATTCCAG 4 500 

CAGAGACAGG GCAAGAAACA GCATACTTCC TCTTAAAATT AGCAGGAAGA TGGCCAGTAA 4 560 

AAACAGTACA TACAGACAAT GGCAGCAATT TCACCAGTAC TACAGTTAAG GCCGCCTGTT 4620 

GGTGGGCGGG GATCAAGCAG GAATTTGGCA TTCCCTACAA TCCCCAAAGT CAAGGAGTAA 4680 

TAGAATCTAT GAATAAAGAA TTAAAGAAAA TTATAGGACA GGTAAGAGAT CAGGCTGAAC 4 74 0 

ATCTTAAGAC AGCAGTACAA ATGGCAGTAT TCATCCACAA TTTTAAAAGA AAAGGGGGGA 4800 

TTGGGGGGTA CAGTGCAGGG GAAAGAATAG TAGACATAAT AG C AAC AG AC ATACAAACTA 4 860 

AAGAATTACA AAAACAAATT ACAAAAATTC AAAATTTTCG GGTTTATTAC AGGGACAGCA 4 920 

GAGATCCAGT TTGGAAAGGA CCAGCAAAGC TCCTCTGGAA AGGTGAAGGG GCAGTAGTAA 4 980 

TACAAGATAA TAGTGACATA AAAGTAGTGC CAAGAAGAAA AGCAAAGATC ATCAGGGATT 504 0 

ATGGAAAACA GATGGCAGGT GATGATTGTG TGGCAAGTAG ACAGGATGAG GATTAACACA 5100 

TGGAAAAGAT TAGTAAAACA CCATATGTAT ATTTCAAGGA AAGCTAAGGA CTGGTTTTAT 5160 

AGACATCACT ATGAAAGTAC TAATCCAAAA ATAAGTTCAG AAGTACACAT CCCACTAGGG 5220 

GATGCTAAAT TAGTAATAAC AACATATTGG GGTCTGCATA CAGGAGAAAG AGACTGGCAT 5280 

TTGGGTCAGG GAGTCTCCAT AGAATGGAGG AAAAAGAGAT ATAGCACACA AGTAGACCCT 5340 

GACCTAGCAG ACCAACTAAT TCATCTGCAC TATTTTGATT GTTTTTCAGA ATCTGCTATA 5400 

AGAAATACCA TATTAGGACG TATAGTTAGT CCTAGGTGTG AATATCAAGC AGGACATAAC 5460 

AAGGTAGGAT CTCTACAGTA CTTGGCACTA GCAGCATTAA TAAAACCAAA ACAGATAAAG 5520 

CCACCTTTGC CTAGTGTTAG GAAACTGACA GAGGACAGAT GGAACAAGCC CCAGAAGACC 5580 

AAGGGCCACA GAGGGAGCCA TACAATGAAT GGACACTAGA GCTTTTAGAG GAACTTAAGA 564 0 

GTGAAGCTGT TAGACATTTT CCTAGGATAT GGCTCCATAA CTTAGGACAA CATATCTATG 5700 
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AAACTTACGG GGATACTTGG GCAGGAGTGG 
TGTTTATCCA TTTCAGAATT GGGTGTCGAC 
5 GAGCAAGAAA TGGAGCCAGT AGATCCTAGA 

CCTAAAACTG CTTGTACCAA TTGCTATTGT 
TTCATGACAA AAGCCTTAGG CATCTCCTAT 

10 

GCTCATCAGA ACAGTCAGAC TCATCAAGCT 
ATGCAACCTA TAATAGTAGC AATAGTAGCA 
15 GTGTGGTCCA TAGTAATCAT AGAATATAGG 

TTAATTGATA GACTAATAGA AAGAGCAGAA 
TCAGCACTTG TGGAGATGGG GGTGGAAATG 

20 

CTGTAGTGCT ACAGAAAAAT TGTGGGTCAC 
AGCAACCACC ACTCTATTTT GTGCATCAGA 
25 TGTTTGGGCC ACACATGCCT GTGTACCCAC 

AAATGTGACA GAAAATTTTA ACATGTGGAA 
TATAATCAGT TTATGGGATC AAAGCCTAAA 

30 

TAGTTTAAAG TGCACTGATT TGAAGAATGA 
GATAATGGAG AAAGGAGAGA TAAAAAACTG 

3 5 TAAGGTGCAG AAAGAATATG CATTCTTTTA 

CAGCTATAGG TTGATAAGTT GTAACACCTC 
CTTTGAGCCA ATTCC CAT AC ATTATTGTGC 

40 

TAATAAGACG TTCAATGGAA CAGGACCATG 
TGGAATCAGG CCAGTAGTAT CAACTCAACT 

4 5 TGTAGTAATT AGATCTGCCA ATTTCACAGA 

CACATCTGTA GAAATTAATT GTACAAGACC 
CCAGAGGGGA CCAGGGAGAG CATTTGTTAC 

50 

ACATTGTAAC ATTAGTAGAG CAAAATGGAA 
AAGAGAACAA TTTGGAAATA ATAAAACAAT 
55 AGAAATTGTA ACGCACAGTT TTAATTGTGG 

ACTGTTTAAT AGTACTTGGT TTAATAGTAC 
AGGAAGTGAC ACAATCACAC TCCCATGCAG 

60 

AGTAGGAAAA GCAATGTATG CCCCTCCCAT 
TACTGGGCTG CTATTAACAA GAGATGGTGG 
65 ACCTGGAGGA GGCGATATGA GGGACAATTG 

AAAAATTGAA CCATTAGGAG TAGCACCCAC 
AAAAAGAGCA GTGGGAATAG GAGCTTTGTT 



92 

AAGCCATAAT AAGAATTCTG CAACAACTGC 5760 

ATAGCAGAAT AGGCGTTACT CGACAGAGGA 582 0 

CTAGAGCCCT GGAAGCATCC AGGAAGTCAG 5880 

AAAAAGTGTT GCTTTCATTG CCAAGTTTGT 5940 

GGCAGGAAGA AGCGGAGACA GCGACGAAGA 6000 

TCTCTATCAA AGCAGTAAGT AGTACATGTA 6060 

TTAGTAGTAG CAATAATAAT AGCAATAGTT 612 0 

AAAATATTAA GACAAAGAAA AATAGACAGG 6180 

GACAGTGGCA ATGAGAGTGA AGGAGAAGTA 624 0 

GGGCACCATG CTCCTTGGGA TATTGATGAT 6 300 

AGTCTATTAT GGGGTACCTG TGTGGAAGGA 6360 

TGCTAAAGCA TATGATACAG AGGTACATAA 64 20 

AGACCCCAAC CCACAAGAAG TAGTATTGGT 6480 

AAATGACATG GTAGAACAGA TGCATGAGGA 6 54 0 

GCCATGTGTA AAATTAACCC CACTCTGTGT 6600 

TACTAATACC AATAGTAGTA GCGGGAGAAT 666 0 

CTCTTTCAAT ATCAGCACAA G C AT AAG AG A 672 0 

TAAACTTGAT AT AGTAC C AA TAGATAATAC 6780 

AGTCATTACA CAGGCCTGTC CAAAGGTATC 684 0 

CCCGGCTGGT TTTGCGATTC TAAAATGTAA 6 900 

TACAAATGTC AGCACAGTAC AATGTACACA 6 96 0 

G CTGTTAAAT GGCAGTCTAG CAGAAGAAGA 7020 

CAATGCTAAA ACCATAATAG TACAGCTGAA 7080 

CAACAACAAT ACAAGAAAAA GTATCCGTAT 714 0 

AATAGGAAAA ATAGGAAATA TGAGACAAGC 7 200 

TGCCACTTTA AAACAGATAG CTAGCAAATT 72 6 0 

AATCTTTAAG CAATCCTCAG GAGGGGACCC 732 0 

AGGGGAATTT TTCTACTGTA ATTCAACACA 7 38 0 

TTGGAGTACT GAAGGGTCAA ATAACACTGA 744 0 

AATAAAACAA TTTATAAACA TGTGGCAGGA 7 500 

CAGTGGACAA ATTAGATGTT CATCAAATAT 7560 

TAATAACAAC AATGGGTCCG AGATCTTCAG 7620 

GAGAAGTGAA TTATATAAAT ATAAAGTAGT 7680 

CAAGGCAAAG AGAAGAGTGG TGCAGAGAGA 774 0 

CCTTGGGTTC TTGGGAGCAG CAGGAAGCAC 7 BOO 
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TATGGGCGCA GCGTCAATGA CGCTGACGGT 
GCAGCAGCAG AACAATTTGC TGAGGGCTAT 
5 AGTCTGGGGC ATCAAACAGC TCCAGGCAAG 

TCAACAGCTC CTGGGGATTT GGGGTTGCTC 
TTGGAATGCT AGTTGGAGTA ATAAATCTCT 

10 

GGAGTGGGAC AGAGAAATTA ACAATTACAC 
GCAAAACCAG CAAGAAAAGA ATGAACAAGA 
15 GTGGAATTGG TTTAACATAA CAAATTGGCT 

AGGAGGCTTG GTAGGTTTAA GAATAGTTTT 
GCAGGGATAT TCACCATTAT CGTTTCAGAC 

20 

GCCCGAAGGA ATAGAAGAAG AAGGTGGAGA 
GAACGGATCC TTAGCACTTA TCTGGGACGA 
2 5 C CG CTTGAG A GACTTACTCT TGATTGTAAC 

GTGGGAAGCC CTCAAATATT GGTGGAATCT 
TAGTGCTGTT AACTTGCTCA ATGCCACAGC 

30 

TATAGAAGTA TTACAAGCAG CTTATAGAGC 
GGGCTTGGAA AGGATTTTGC TATAAGATGG 
35 GATGGCCTGC TGTAAGGGAA AGAATGAGAC 

AAGAACTCTT CACTGGAGTT GTCCCAATTC 
ACAAGTTCTC TGTCAGTGGA GAGGGTGAAG 

40 

AGTTCATCTG CACTACTGGC AAACTGCCTG 
CTTATGGTGT TCAATGCTTT TCAAGATACC 
4 5 AGAGTGCCAT GCCCGAAGGT TATGTACAGG 

ACTACAAGAC ACGTG CTGAA GTCAAGTTTG 
TAAAAGGTAT TGACTTCAAG GAAGATGGCA 

50 

ATAACTCACA CAATGTATAC ATCATGGCAG 
TCAAGACCCG CCACAACATT GAAGATGGAA 
55 ATACTCCAAT TGGCGATGGC CCTGTCCTTT 

CTGCCCTTTC GAAAGATCCC AACGAAAAGA 
CAGCTGCTGG GATTACACAT GGCATGGATG 

60 

ACATGGAGCA ATCACAAGTA GCAATACAGC 
AGCACAAGAG GAGGAAGAGG TGGGTTTTCC 
65 GACTTACAAG GCAGCTGTAG ATCTTAGCCA 

G CTAATTC AC TCCCAAAGAA GACAAGATAT 
CTACTTCCCT GATTGGCAGA ACTACACACC 



93 

ACAGGCCAGA CAATTATTGT CTGATATAGT 7860 

TGAGGCGCAA CAGCATCTGT TGCAACTCAC 7920 

AATCCTGGCT GTGGAAAGAT ACCTAAAGGA 7980 

TGGAAAACTC ATTTGCACCA CTGCTGTGCC 8040 

GGAACAGATT TGGAATAACA TGACCTGGAT 8100 

AAGCTTAATA CACTCCTTAA TTGAAGAATC 8160 

ATTATTGGAA TTAGATAAAT GGGCAAGTTT 8220 

GTGGTATATA AAATTATT C A TAATGATAGT 8280 

TGCTGTACTT TCTATAGTGA ATAGAGTTAG 834 0 

CCACCTCCCA ATCCCGAGGG GACCCGACAG 8400 

GAGAGACAGA GACAGATCCA TTCGATTAGT 8460 

TCTGCGGAGC CTGTGCCTCT TCAGCTACCA 852 0 

GAGGATTGTG GAACTTCTGG GACGCAGGGG 8580 

CCTACAGTAT TGGAGTCAGG AACTAAAGAA 864 0 

CATAGCAGTA GCTGAGGGGA CAGATAGGGT 8700 

TATTCGCCAC ATACCTAGAA GAATAAGACA 876 0 

GTGGCAAGTG GTCAAAAAGT AGTGTGATTG 8820 

GAGCTGAGCA AGAAATGGCT AG C AAAGGAG 8880 

TTGTTGAATT AGATGGTGAT GTTAACGGCC 894 0 

GTGATGCAAC ATACGGAAAA CTTACCCTGA 9000 

TTCCATGGCC AACACTTGTC ACTACTCTCT 9060 

CGGATCATAT GAAACGGCAT GACTTTTTCA 9120 

AAAGGACCAT CTTCTTCAAA GATGACGGCA 9180 

AAGGTGATAC CCTTGTTAAT AGAATCGAGT 9240 

ACATTCTGGG ACACAAATTG GAATACAACT 9300 

ACAAACAAAA GAATGGAATC AAAGTGAACT 9360 

GCGTTCAACT AGCAGACCAT TATCAACAAA 9420 

TACCAGACAA CCATTACCTG TCCACACAAT 9480 

GAGACCACAT GGTCCTTCTT GAGTTTGTAA 954 0 

AACTGTACAA CGGACTCGAG ACCTAGAAAA 9600 

AGCTAACAAT GCTGCTTGTG CCTGGCTAGA 9660 

AGTCACACCT CAGGTACCTT TAAGACCAAT 9720 

CTTTTTAAAA GAAAAGGGGG GACTGGAAGG 9780 

CCTTGATCTG TGGATCTACC ACACACAAGG 984 0 

AGGGCCAGGG GTCAGATATC CACTGACCTT 9900 
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TGGATGGTGC TACAAGCTAG TACCAGTTGA 
AGAGAACACC AGCTTGTTAC ACCCTGTGAG 
5 AGTGTTAGAG TGGAGGTTTG ACAGCCGCCT 

TCCGGAGTAC TTCAAGAACT GCTGACATCG 
CTTTCCAGGG AGGCGTGGCC TGGGCGGGAC 

10 

ATAAGCAGCT GCTTTTTGCC TGTACTGGGT 
AGCTCTCTGG CTAACTAGGG AACCCACTGC 
15 TTCAAGTAGT GTGTGCCCGT CTGTTGTGTG 

TTTAGTCAGT GTGGAAAATC TCTAGCACCC 
ATCGCGCCAC TGCATTCCAG CCTGGGCAAG 

20 

AGTTAAGGGT ATTAAATATA TTTATACATG 
GGCGCAGTGG CTCACACCTG CGCCCGGCCC 
25 AGTTTGGGAG TTCCAGACCA GCCTGACCAA 

AGTAGATTTT ATTTTATGTG TATTTTATTC 
TTCCTCTACT CTGATACCAC AAGAATCATC 

30 

TGGTGGGAGA GGGAG GTTTT CACCAGCACA 
GGTGTCCTTC GGTTCAGTTC CAACACCGCC 
35 GGGCTCAGTC CCCAAGACAT AAACACCCAA 

TGCTGCCCAG GCAGAGCCGA TTCACCAAGA 
CACAGAGCCG GCTGTGCGGG AGAACGGAGT 

40 

CATTCGGGGA TCAGAGTTTT TAAGGATAAC 
TGAAAGCGTA GGGAGTCGAA GGTGTCCTTT 
4 5 CAAGATCGGA TGAGCCAGTT TATCAATCCG 

TCTGCAAAAT ATCTCAAGCA CTGATTGATC 
GAACAATTTG GGGAAGG TC A GAATCTTGTA 

50 

TTTCTTTTTT GTTTTTTTTT TTTTATTTTT 
GGAGTGCAGT GGTGCAATCA CAGCTCACTG 
55 TCCCACCTCA GCCTGCCTGG TAGCTGAGAC 

TTTTGGTAGA GGCAGCGTTT TGCCGTGTGG 
GTGATCCAGC CTCAGCCTCC CAAAGTGCTG 

60 

CCTAAACCAT AATTTCTAAT CTTTTGGCTA 
CCCAGGCAAA AAGGGGGTTT GTTTCGGGAA 
6 5 AAACTAAGTT CCTCCTAAAC TTAGTTCGGC 

GAGGTTAGAA GCACGATGGA ATTGGTTAGG 
TTTG CAATGG TGGTTCAAAG ACTGCCCGCT 



94 



GCCAGATAAG 


GTAGAAGAGG 


CCAATAAAGG 


9960 


CCTGCATGGA 


ATGGATGACC 


CTGAGAGAGA 


10020 


AGCATTTCAT 


CACGTGGCCC 


GAGAGCTGCA 


10080 


AGCTTGCTAC 


AAGGGACTTT 


CCGCTGGGGA 


10140 


TGGGGAGTGG 


CGAGCCCTCA 


GATGCTGCAT 


10200 


CTCTCTGGTT 


AGACCAGATC 


TGAGCCTGGG 


10260 


TTAAGCCTCA 


ATAAAGCTTG 


CCTTGAGTGC 


10320 


ACTCTGGTAA 


CTAGAGATCC 


CTCAGACCCT 


103B0 


CCCAGGAGGT 


AGAGGTTGCA 


GTGAGCCAAG 


10440 


AAAACAAGAC 


TGTCTAAAAT 


AATAATAATA 


10500 


GAGGTCATAA 


AAATATATAT 


ATTTGGGCTG 


10560 


TTTGGGAGGC 


CGAGGCAGGT 


GGATCACCTG 


10620 


CATGGAGAAA 


CCCCTTCTCT 


GTGTATTTTT 


10680 


ACAGGTATTT 


CTGGAAAACT 


GAAACTGTTT 


10740 


AGCACAGAGG 


AAGACTTCTG 


TGATCAAATG 


10800 


TG AG C AG TC A 


GTTCTGCCGC 


AGACTCGGCG 


10860 


TGCCTGGAGA 


GAGGTCAGAC 


CACAGGGTGA 


10920 


GACATAAACA 


CCCAACAGGT 


CCACCCCGCC 


10980 


CGGGAATTAG 


GATAGAGAAA 


GAGTAAGTCA 


11040 


T CTATTATG A 


CTCAAATCAG 


TCTCCCCAAG 


11100 


TTAGTGTGTA 


GGGGGCCAGT 


GAGTTGGAGA 


11160 


TGCGCCGAGT 


CAGTTCCTGG 


GTGGGGGCCA 


11220 


GGGGTG C C AG 


CTGATCCATG 


GAGTGCAGGG 


11280 


TTAGGTTTTA 


CAATAGTGAT 


GTTACCCCAG 


11340 


GCCTGTAGCT 


GCATGACTCC 


TAAACCATAA 


11400 


GAGACAGGGT 


CTCACTCTGT 


CACCTAGGCT 


11460 


CAGCCTCAAC 


GTCGTAAGCT 


CAAGCGATCC 


11520 


TACAAGCGAC 


GCCCCAGTTA 


ATTTTTGTAT 


11580 


CCCTGGCTGG 


TCTCGAACTC 


CTGGGCTCAA 


11640 


GGACAACCGG 


GGCCAGTCAC 


TGCACCTGGC 


11700 


ATTTGTTAGT 


CCTACAAAGG 


CAGTCTAGTC 


11760 


AGGGCTGTTA 


CTGTCTTTGT 


TTCAAACTAT 


11820 


CTACACCCAG 


GAATGAACAA 


GGAGAGCTTG 


11880 


TCAGATCTCT 


TTCACTGTCT 


GAGTTATAAT 


11940 


TCTGACACCA 


GTCGCTGCAT 


TAATGAATCG 


12000 
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GCCAACGCGC GGGGAGAGGC GGTTTGCGTA TTGGCGCTCT TCCGCTTCCT CGCTCACTGA 12060 

CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT 12120 

ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA 12180 

AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCC 1224 0 

TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG CGAAACCCGA CAGGACTATA 123 00 

AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC TCTCCTGTTC CGACCCTGCC 12360 

GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC GTGGCGCTTT CTCAATG CTC 12420 

ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA 12480 

ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC TATCGTCTTG AGTCCAACCC 1254 0 

GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT AACAGGATTA GCAGAGCGAG 12600 

GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT AACTACGGCT ACACTAGAAG 12660 

GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGTTGGTAG 12720 

CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA 12780 

GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG AT CTTTT CTA CGGGGTCTGA 1284 0 

CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC ATGAGATTAT CAAAAAGGAT 12 900 

CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA TCAATCTAAA GTATATATGA 12960 

GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG GCACCTATCT CAGCGATCTG 13020 

TCTATTTCGT TCATCCATAG TTG CCTG ACT CCCCGTCGTG TAGATAACTA CGATACGGGA 13080 

GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA GACCCACGCT CACCGGCTCC 1314 0 

AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG CGCAGAAGTG GTCCTGCAAC 13200 

TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC 13260 

AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC ATCGTGGTGT CACGCTCGTC 13320 

GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA AGGCGAGTTA CATGATCCCC 13380 

CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT 1344 0 

GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT AATTCTCTTA CTGTCATGCC 13500 

ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC AAGTCATTCT GAGAATAGTG 13 560 

TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG GATAATACCG CGCCACATAG 13 620 

CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG GGGCGAAAAC TCTCAAGGAT 13680 

CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT GCACCCAACT GATCTTCAGC 13740 

ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA GGAAGGCAAA ATGCCGCAAA 13 800 

AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA CTCTTCCTTT TTCAATATTA 13860 

TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC ATATTTGAAT GTATTTAGAA 13920 

AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA GTGCCACCTG ACGTCTAAGA 13 980 

AACCATTATT ATCATGACAT TAACCTATAA AAATAGGCGT ATCACGAGGC CCTTTCGTCT 14040 

TCAAGAACTG CCTCGCGCGT TTCGGTGATG ACGGTGAAAA CCTCTGACAC ATGCAGCTCC 14100 
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CGGAGACGGT CACAGCTTGT CTGTAAGCGG 
CGTCAGCGGG TGTTGGCGGG TGTCGGGGCG 
5 GAGTGTACTG G CTTAACTAT GCGGCATCAG 

GGTGTGAAAT ACCGCACAGA TGCGTAAGGA 
TCAGGCTGCG CAACTGTTGG GAAGGGCGAT 

10 

GCGGGGAGGC AGAGATTGCA GTAAGCTGAG 
AGAGTAAGAC TCTGTCTCAA AAATAAAATA 
15 CTTTATTTAT TTATTTATTT TCTATTTTGG 

ACATATATTC TATTTTTCTT TATATGCTCC 
TGTATACAAA ATCTAGGCCA GTCCAGCAGA 

20 

ATAAATAAAA TCTAGCTCAC TCCTTCACAT 
TACCAAATAA CCCATCTTGT CCTCAATAAT 

2 5 CCTGTCAAAG GCATGTGCCC CTTCCGGGCG 

GGACTCTGCA GGGTCCCTAA CTGCCAAGCC 
TCTAGCGGCT GCCCCCACTC GGCTTTGCTT 

30 

AGGTCTGAAA CTAGGTGCGC ACAGAGCGGT 
AGGGGGTTTA TCACAGTGCA CCCTGACAGT 

3 5 CACCCTGACA GTCGTCAGCC TCACAGGGGG 

ATTTGATTCA CAATTTTTTT AGTCTCTACT 
AGGTGTGTTC CCAGAGGGGA AAACAGTATA 

40 

CTCCACCTGG GTCTTGGAAT GTGTCCCCCG 
ACAGGTCACA GTGACACAAG ATAACCAAGA 

4 5 CTCCACGTGC ACATGGCCGG AGGAACTGCC 

AGAGTCCTTG GTGTGGAGGG AGGGACCAGC 
AACCTAGGGA AAGCCCCAGT TCTACTTACA 

50 
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ATGCCGGGAG 


CAGACAAGCC 


CGTCAGGGCG 


14160 


CAGCCATGAC 


CCAGTCACGT 


AGCGATAGCG 


14220 


AGCAGATTGT 


ACTGAGAGTG 


CACCATATGC 


14280 


GAAAATACCG 


CATCAGGCGC 


CATTCGCCAT 


14340 


CGGTGCGGGC 


CTCTTCGCTA 


TTACGCCAGC 


14400 


ATCGCAGCAC 


TGCACTCCAG 


CCTGGGCGAC 


14460 


AATAAATCAA 


TCAGATATTC 


C AATCTTTT C 


14 520 


AAACACAGTC 


CTTC CTTATT 


CCAGAATTAC 


It JOU 


AGTTTTTTTT 


AGACCTTCAC 


CTGAAATHTG 


14640 


G C CT AAAGGT 


AAAAAATAAA 

************ A 


ATAATAAAAA 




CAAAATGGAG 


ATACAGCTGT 


TAG C ATTAAA 


14760 


TTTAAGCGC C 


TCTCTCCACC 


ACATCTAACT 


14820 


CTCTGCTGTG 


GTGCCAACCA 


ACTGGCATGT 


14880 


CCACAGTGTG 


CCCTGAGGCT 


G C* C C CTT C f T 

VJ V— \_ >w u A. J. \«. V— A 


1494 0 


TCCCTAGTTT 


CAGTTACTTG 


CGTTCAGCCA 


15000 


AAGACTGCGA 


GAGAAAGAGA 


CCAGCTTTAC 


15060 


CGTCAGCCTC 


ACAGGGGGTT 


TATCACATTG 


15120 


TTTATCACAG 


TGCACCCTTA 


CAATCATTCC 


15180 


GTGCCTAACT 


TGTAAGTTAA 


ATTTGATCAG 


15240 


TACAGGGTTC 


AGTACTATCG 


CATTTCAGGC 


15300 


AGGGGTGATG 


ACTACCTCAG 


TTGGATCTCC 


15360 


CACCTCCCAA 


GGCTACCACA 


ATGGGCCGCC 


15420 


ATGTCGGAGG 


TGCAAGCACA 


CCTGCGCATC 


15480 


GCAGCTTCCA 


GCCATCCACC 


TGATGAACAG 


15540 


CCAGGAAAGG 


C 




15581 



WO 97/42320 



PCT/US97/07625 



97 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..74 

(D) OTHER INFORMATION: /note= "primer #17982" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
GGGGCGTACG GAGCGCTCCG AATTCGGTAC CGTTTAAACG GGCCCTCTCG AGTCCGTTGT 
ACAGTTCATC CATG 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..66 

(D) OTHER INFORMATION: /note= "primer #17983" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
GGGGGAATTC GCGCGCGTAC CTAAGCGCTA GCTGAGCAAG AAATGGCTAG CAAAGGAGAA 
GAACTC 
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WHAT IS CL AIMED IS : 



1 1. An isolated nucleic acid that encodes an 

2 engineered Aequorea victoria fluorescent protein, wherein the 

3 protein encoded by the isolated nucleic acid is selected from 

4 the group that consists of: 

5 a. a protein that has leucine at amino acid position 

6 .65, and wherein said protein has a cellular 

7 fluorescence that is at least five times greater 

8 than the cellular fluorescence of wild type Aequorea 

9 victoria green fluorescent protein; 

10 b. a protein that has leucine at amino acid position 65 

11 and threonine at position 168, and wherein said 

12 protein has a cellular fluorescence that is at least 

13 five times greater than wild type Aequorea victoria 

14 green fluorescent protein; 

15 c. a protein that has leucine at amino acid position 65 

16 threonine at position 168, and cysteine at position 

17 66, wherein said protein has a cellular fluorescence 

18 that is at least five times greater than the 

19 cellular fluorescence of wild type Aequorea victoria 

20 green fluorescent protein; 

21 d. A blue fluorescent protein that has histidine at 

22 amino acid position 67, leucine at position 65 and 

23 has a cellular fluorescence that is at least five 

24 times greater than that of BFP (Tyr 67 ^His) ; 

25 e. a blue fluorescent protein that has histidine at 

26 amino acid position 67, alanine at amino acid 

27 position 164 and has a cellular fluorescence that is 

28 at least five times greater than that of 

29 BFP(Tyr 67 -*His) ; 

30 f. a blue fluorescent protein that has histidine at 

31 amino acid position 67, leucine at amino acid 

32 position 65, alanine at amino acid position 164 and 

33 has a cellular fluorescence that is at least five 

34 times greater than that of BFP (Tyr 67 ->His) . 
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1 2 . An isolated nucleic acid of claim 1, which 

2 encodes an engineered Aequorea victoria green fluorescent 

3 protein ("GFP") having a cellular fluorescence that is at 

4 least five times greater than that of wild type GFP, wherein 

5 the engineered GFP has a leucine at amino acid position 65. 

1 3 . An isolated nucleic acid according to claim 2, 

2 wherein the nucleic acid further encodes a threonine at amino 

3 acid position 168. 

1 4. An isolated nucleic acid according to claim 3 #< 

2 wherein the nucleic acid further encodes a cysteine at amino 

3 acid position 66. 

1 5. An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at amino acid position 67 and leucine at 

4 position 65, and has a cellular fluorescence that is at least 

5 five times greater than that of BFP (Tyr 67 -»His) . 

1 6. An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at amino acid position 67 and alanine at amino 

4 acid position 164, and has a cellular fluorescence that is at 

5 least five times greater than that of BFP (Tyr 67 -»His) . 

1 7. An isolated nucleic acid according to claim 6, 

2 wherein the nucleic acid further encodes leucine at amino acid 

3 position 65. 

1 8. A transformed cell that expresses a protein 

2 encoded by a nucleic acid of claim 1. 

1 9. A vector comprising a nucleic acid of claim 1. 

1 10. A transformed cell comprising a vector of 

2 claim 9 . 
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1 11. A transformed cell that expresses a protein 

2 encoded by the nucleic acid of claim 1 fused to a protein 

3 encoded by a second nucleic acid of interest. 

1 12. An isolated engineered Aeqruorea victoria green 

2 fluorescent protein ( "GFP" ) wherein the engineered GFP 

3 comprises leucine at amino acid position 65, said engineered 

4 GFP having a cellular fluorescence that is at least five times 

5 greater than wild type GFP. 

1 13 . An isolated engineered Aeguorea victoria green 

2 fluorescent protein ("GFP") according to claim 12, wherein the 

3 engineered GFP has threonine at amino acid position 168. 

1 14 . An isolated engineered Aequorea victoria green 

2 fluorescent protein ("GFP") according to claim 13, wherein the 

3 engineered GFP has cysteine at amino acid position 66. 

1 15. An isolated blue fluorescent protein ( W BFP " ) 

2 that comprises histidine at amino acid position 67 and leucine 

3 at amino acid position 65 and has a cellular fluorescence that 

4 is at least five times greater than that of BFP<Tyr 67 -*His) . 

1 16. An isolated blue fluorescent protein ("BFP") 

2 that has a histidine at amino acid position 67 and an alanine 

3 at amino acid position 164, that has a cellular fluorescence 

4 that is at least five times greater than that of 

5 BFP(Tyr 67 -His) . 

1 17. An isolated blue fluorescent protein ("BFP") 

2 according to claim 16, wherein the BFP further has leucine at 

3 amino acid position 65. 

1 18. A method of detecting and optionally isolating 

2 an engineered cell that contains a selected nucleic acid which 

3 encodes a selected protein or nucleic acid, comprising: 

4 a) stably introducing into a host cell in a population of 

5 host cells a vector that contains a first nucleic acid which 
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6 encodes a polypeptide selected from the group consisting of 
? SG11, SG12, SG25, SB42, SB49, SB50 and a second nucleic acid 

8 which encodes a selected protein or nucleic acid, and 

9 b) detecting cells in the population of host cells that 
express SG11, SG12, SG25, SB42, SB49, or SB50, and 

c) optionally sorting cells that express SG11, SG12, 

12 SG25, SB42, SB49, or SB50 with a fluorescence-activated cell 

13 sorter to isolate individual cells that express said 

14 fluorescent protein. 



10 
11 



1 19. A nucleic acid construct wherein a coding 

2 sequence selected from the group consisting of sequences that 

3 encode SG11, SG12, SG25, SB42, SB4 9, and SB5 0 is operably 
linked to a regulatory sequence of a selected gene. 



4 



1 



3 



20. A nucleic acid construct wherein a first coding 
2 sequence that encodes a selected polypeptide is fused using 

genetic engineering to a second coding sequence selected from 

4 the group consisting of sequences that encode SG11, SG12, 

5 SG25, SB4 2, SB4 9, and SB50, such that expression of the fused 

6 sequence yields a fluorescent hybrid protein in which the 

7 polypeptide encoded by the first coding sequence is fused to 

8 the polypeptide encoded by the second coding sequence. 

1 21. A method of detecting and characterizing 

2 regulatory and coding sequence elements that regulate 

3 subcellular expression and targeting of proteins, comprising: 

4 a) expressing in an engineered cell, in the presence and 

5 absence of selected culture conditions and components, a 

6 nucleic acid wherein a first nucleic acid selected from the 

7 group consisting of nucleic acids that encode SG11, SG12, 

8 SG25, SB42, SB49, and SB50 is operably linked to a second 

9 nucleic acid derived from a selected gene; 

10 b) detecting the presence and subcellular localization of 

11 fluorescent signal. 
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